106_vgGray R. - Probability, Random Processes, And Ergodic Properties

5.2. MEASUREMENTS AND EVENTS 97

where the F (i) are disjoint, and hence since gn ∈ M(σ(f)) that

g−1n (ri) = F (i) ∈ σ(f).

Since σ(f) = f−1(B(A)), there are disjoint sets Q(i) ∈ B(A) such that F (i) = f−1(Q(i)). Definethe function hn : A → < by

hn(a) =M∑i=1

ri1Q(i)(a)

and

hn(f(ω)) =M∑i=1

ri1Q(i)(f(ω)) =M∑i=1

ri1f−1(Q(i))(ω)

=M∑i=1

ri1F (i)(ω) = gn(ω).

This proves the result for simple functions. By construction we have that g(ω) = limn→∞ gn(ω) =limn→∞ hn(f(ω)) where, in particular, the right-most limit exists for all ω ∈ Ω. Define the functionh(a) = limn→∞ hn(a) where the limit exists and 0 otherwise. Then g(ω) = limn→∞ hn(f(ω)) =h(f(ω)), completing the proof. 2

Thus far we have developed the properties of σ-fields induced by random variables and of classesof functions measurable with respect to σ-fields. The idea of a σ-field induced by a single randomvariable is easily generalized to random vectors and sequences. We wish, however, to consider themore general case of a σ-field induced by a possibly uncountable class of measurements. Then we willhave associated with each class of measurements a natural σ-field and with each σ-field a naturalclass of measurements. Toward this end, given a class of measurements M, define σ(M) as thesmallest σ-field with respect to which all of the measurements in M are measurable. Since anyσ-field satisfying this condition must contain all of the σ(f) for f ∈ M and hence must contain theσ-field induced by all of these sets and since this latter collection is a σ-field,

σ(M) = σ(⋃

f∈Mσ(f)).

The following lemma collects some simple relations among σ-fields induced by measurements andclasses of measurements induced by σ-fields.

Lemma 5.2.2 Given a class of measurements M, then

M ⊂ M(σ(M)).

Given a collection of events G, thenG ⊂ σ(M(σ(G))).

If G is also a σ-field, thenG = σ(M(G)),

that is, G is the smallest σ-field with respect to which all G-measurable functions are measurable. IfG is a σ-field and I(G) = all 1G, G ∈ G is the collection of all indicator functions of events in G,then

G = σ(I(G)),

that is, the smallest σ-field induced by indicator functions of sets in G is the same as that inducedby all functions measurable with respect to G.

98 CHAPTER 5. CONDITIONAL PROBABILITY AND EXPECTATION

Proof: If f ∈ M, then it is σ(f)-measurable and hence in M(σ(M)). If G ∈ G, then its indica-tor function 1G is σ(G)-measurable and hence 1G ∈ M(σ(G)). Since 1G ∈ M(σ(G)), it must bemeasurable with respect to σ(M(σ(G))) and hence 1−1

G (1) = G ∈ σ(M(σ(G))), proving the secondstatement. If G is a σ-field, then since since all functions in M(G) are G-measurable and sinceσ(M(G)) is the smallest σ-field with respect to which all functions in M(G) are G-measurable, Gmust contain σ(M(G)). Since I(G) ⊂ M(G),

σ(I(G)) ⊂ σ(M(G)) = G.

If G ∈ G, then 1G is in I(G) and hence must be measurable with respect to σ(I(G)) and hence1−1

G (1) = G ∈ σ(I(G)) and hence G ⊂ σ(I(G)), completing the proof. 2

We conclude this section with a reminder of the motivation for considering classes of events andmeasurements. We shall often consider such classes either because a particular class is important fora particular application or because we are simply given a particular class. In both cases we may wishto study both the given events (measurements) and the related class of measurements (events). Forexample, given a class of measurements M, σ(M) provides a σ-field of events whose occurrence ornonoccurrence is determinable by the output events of the functions in the class. In turn, M(σ(M))is the possibly larger class of functions whose output events are determinable from the occurrenceor nonoccurrence of events in σ(M) and hence by the output events of M. Thus knowing all outputevents of measurements in M is effectively equivalent to knowing all output events of measurementsin the more structured and possibly larger class M(σ(M)), which is in turn equivalent to knowingthe occurrence or nonoccurrence of events in the σ-field σ(M). Hence when a class M is specified,we can instead consider the more structured classes σ(M) or M(σ(M)). From the previous lemma,this is as far as we can go; that is,

σ(M(σ(M))) = σ(M). (5.1)

We have seen that a function g will be in M(σ(f)) if and only if it depends on the underlyingsample points through the value of f . Since we did not restrict the structure of f , it could, forexample, be a random variable, vector, or sequence, that is, the conclusion is true for countablecollections of measurements as well as individual measurements. If instead we have a general classM of measurements, then it is still a useful intuition to think of M(σ(M)) as being the class of allfunctions that depend on the underlying points only through the values of the functions in M.

Exercises

1. Which of the following relations are true and which are false?

f2 ∈ σ(f), f ∈ σ(f2)

f + g ∈ σ(f, g), f ∈ σ(f + g)

2. If f : A → Ω, g : Ω → Ω, and g(f) : A → Ω is defined by g(f)(x) = g(f(x)), then g(f) ∈ σ(f).

3. Given a class M of measurements, is⋃

f∈M σ(f) a σ-field?

4. Suppose that (A,B) is a measure space and B is separable with a countable generating classVn; n = 1, 2 . . .. Describe σ(1Vn

; n = 1, 2, . . .).

5.3. RESTRICTIONS OF MEASURES 99

5.3 Restrictions of Measures

The first application of the classes of measurements or events considered in the previous section isthe notion of the restriction of a probability measure to a sub-σ-field. This occasionally provides ashortcut to evaluating expectations of functions that are measurable with respect to sub-σ-fields andin comparing such functions. Given a probability space (Ω,B,m) and a sub-σ-field G of B, definethe restriction of m to G, mG , by

mG(F ) = m(F ), F ∈ G.

Thus (Ω,G,mG) is a new probability space with a smaller event space. The following lemma showsthat if f is a G-measurable real-valued random variable, then its expectation can be computed withrespect to either m or mG .

Lemma 5.3.1 Given a G-measurable real-valued measurement f ∈ L1(m), then also f ∈ L1(mG)and ∫

f dm =∫

f dmG ,

where mG is the restriction of m to G.

If f is a simple function, the result is immediate from the definition of restriction. More generallyuse Lemma 4.3.1(e) to infer that qn(f) is a sequence of simple G-measurable functions convergingto f and combine the simple function result with Corollary 4.4.1 applied to both measures.

Corollary 5.3.1 Given G-measurable functions f, g ∈ L1(m), if∫F

f dm ≤∫

F

g dm, all F ∈ G,

then f ≤ gm-a.e. If the preceding holds with equality, then f = g m-a.e.

Proof: From the previous lemma, the integral inequality holds with m replaced by mG , and hencefrom Lemma 4.4.7 the conclusion holds mG-a.e. Thus there is a set, say G, in G with mG probabilityone and hence also m probability one for which the conclusion is true. 2

The usefulness of the preceding corollary is that it allows us to compare G-measurable functionsby considering only the restricted measures and the corresponding expectations.

5.4 Elementary Conditional Probability

Say we have a probability space (Ω,B,m), how is the probability measure m altered if we are toldthat some event or collection of events occurred? For example, how is it influenced if we are giventhe outputs of a measurement or collection of measurements? The notion of conditional probabil-ity provides a response to this question. In fact there are two notions of conditional probability:elementary and nonelementary.

Elementary conditional probabilities cover the case where we are given an event, say F , havingnonzero probability: m(F ) > 0. We would like to define a conditional probability measure m(G|F )for all events G ∈ B. Intuitively, being told an event F occurred will put zero probability on thecollection of all points outside F , but it should not effect the relative probabilities of the variousevents inside F . In addition, the new probability measure must be renormalized so as to assignprobability one to the new “certain event” F . This suggests the definition m(G|F ) = km(G ∩ F ),

100 CHAPTER 5. CONDITIONAL PROBABILITY AND EXPECTATION

where k is a normalization constant chosen to ensure that m(F |F ) = km(F ∩ F ) = km(F ) = 1.Thus we define for any F such that m(F ) > 0. the conditional probability measure

m(G|F ) =m(G ∩ F )

m(F ), all G ∈ B.

We shall often abbreviate the elementary conditional probability measure m(·|F ) by mF . Givena probability space (Ω,B,m) and an event F ∈ B with m(F ) > 0, then we have a new probabilityspace (F,B ∩ F,mF ), where B ∩ F = all sets of the form G ∩ F,G ∈ B . It is easy to see that wecan relate expectations with respect to the conditional and unconditional measures by

EmF (f) =∫

f dmF =

∫F

f dm

m(F )=

Em1F f

m(F ), (5.2)

where the existence of either side ensures that of the other. In particular, f ∈ L1(mF ) if and onlyif f1F ∈ L1(m). Note further that if G = F c, then

Emf = m(F )EmF (f) + m(G)EmG(f). (5.3)

Instead of being told that a particular event occurred, we might be told that a random variableor measurement f is discrete and takes on a specific value, say a with nonzero probability. Thenthe elementary definition immediately yields

m(F |f = a) =m(G ∩ f = a)

m(f = a).

If, however, the measurement is not discrete and takes on a particular value with probability zero,then the preceding elementary definition does not work. One might attempt to replace the previousdefinition by some limiting form, but this does not yield a useful theory in general and it is clumsy.The standard alternative approach is to replace the preceding constructive definition by a descriptivedefinition, that is, to define conditional probabilities by the properties that they should possess. Themathematical problem is then to prove that a function possessing the desired properties exists.

In order to motivate the descriptive definition, we make several observations on the elementarycase. First note that the previous conditional probability depends on the value a assumed by f , thatis, it is a function of a. It will prove more useful and more general instead to consider it a functionof ω that depends on ω only through f(ω), that is, to consider conditional probability as a functionm(G|f)(ω) = m(G|λ : f(λ) = f(ω)), or, simply, m(G|f = f(ω)), the probability of an event Ggiven that f assumes a value f(ω). Thus a conditional probability is a function of the points in theunderlying sample space and hence is itself a random variable or measurement. Since it depends onω only through f , from Lemma 5.2.1 the function m(G|f) is measurable with respect to σ(f), theσ-field induced by the given measurement. This leads to the first property:

For any fixed G,m(G|f) is σ(f) − measurable. (5.4)

Next observe that in the elementary case we can compute the probability m(G) by averaging orintegrating the conditional probability m(G|f) over all possible values of f ; that is,∫

m(G|f) dm =∑

a

m(G|f = a)m(f = a)

=∑

a

m(G ∩ f = a) = m(G). (5.5)

5.4. ELEMENTARY CONDITIONAL PROBABILITY 101

In fact we can and must say more about such averaging of conditional probabilities. Suppose thatF is some event in σ(f) and hence its occurrence or nonoccurrence is determinable by observationof the value of f , that is, from Lemma 5.2.1 1F (ω) = h(f(ω)) for some function h. Thus if we aregiven the value of f(ω), being told that ω ∈ F should not add any knowledge and hence shouldnot alter the conditional probability of an event G given f . To try to pin down this idea, assumethat m(F ) > 0 and let mF denote the elementary conditional probability measure given F , that is,mF (G) = m(G ∩ F )/m(F ). Applying (5.5) to the conditional measure mF yields the formula∫

mF (G|f) dmF = mF (G),

where mF (G|f) is the conditional probability of G given the outcome of the random variable f andgiven that the event F occurred. But we have argued that this should be the same as m(G|f).Making this substitution, multiplying both sides of the equation by m(F ), and using (5.2) we derive∫

F

m(G|f) dm = m(G ∩ F ), all F ∈ σ(f). (5.6)

To make this plausibility argument rigorous observe that since f is assumed discrete we can write

f(ω) =∑a∈A

a1f−1(a)(ω)

and hence1F (ω) = h(f(ω)) = h(

∑a∈A

a1f−1(a)(ω))

and thereforeF =

⋃a:h(a)=1

f−1(a).

We can then write∫F

m(G|f) dm =∫

1F m(G|f) dm =∫

h(f)m(G|f) dm

=∑

a

h(a)m(G ∩ f = a)

m(f = a) m(f = a) =∑

a

h(a)m(G ∩ f−1(a))

= m(G ∩⋃

a:h(a)=1

f−1(a)) = m(G ∩ F ).

Although (5.6) was derived for F with m(F ) > 0, the equation is trivially satisfied if m(F ) = 0since both sides are then 0. Hence the equation is valid for all F in σ(f) as stated. Eq. (5.6) statesthat not only must we be able to average m(G|f) to get m(G), a similar relation must hold whenwe cut down the average to any σ(f)-measurable event.

Equations (5.4) and (5.6) provide the properties needed for a rigorous descriptive definition ofthe conditional probability of an event G given a random variable or measurement f . In fact, thisconditional probability is defined as any function m(G|f)(ω) satisfying these two equations. At thispoint, however, little effort is saved by confining interest to a single conditioning measurement, andhence we will develop the theory for more general classes of measurements. In order to do this,however, we require a basic result from measure theory–the Radon-Nikodym theorem. The next twosections develop this result.

106_vgGray R. - Probability, Random Processes, And Ergodic Properties

Documents

Transcript of 106_vgGray R. - Probability, Random Processes, And Ergodic Properties