Basic probability refresher - imag Lecture 1 Basic probability refresher 1.1 Characterizations of...

Click here to load reader

  • date post

    04-Jul-2020
  • Category

    Documents

  • view

    2
  • download

    0

Embed Size (px)

Transcript of Basic probability refresher - imag Lecture 1 Basic probability refresher 1.1 Characterizations of...

  • Lecture 1

    Basic probability refresher

    1.1 Characterizations of random variables

    Let (Ω,F , P ) be a probability space where Ω is a general set, F is a σ-algebra and P is a probability measure on Ω. A random variable (r.v.) X is a (scalar) measurable function X : (Ω,F) → (R,B) where B is a Borel σ-algebra. We will also write X(ω) to stress the fact that it is a function of ω ∈ Ω.

    Cumulative distribution function (c.d.f.) of a random variable X is the function F : R→ [0, 1]

    F (x) = P (X ≤ x) = P (ω : X(ω) ≤ x).

    F is monotone nondecreasing, right-continuous and such that F (−∞) = 0 and F (∞) = 1. We also refer to F as the probability law (distribution) of X.

    We distinguish 2 types of random variables: discrete variables and continuous variables.

    Discrete variable X takes values in the finite or countable set. Poisson random variable X 1 is an example of a discrete variable with countable value set: for λ > 0 the distribution of X satisfies

    Pλ(X = k) = λk

    k! e−λ, k = 0, 1, 2, ...

    1We will see in the sequel the importance of this law and how it is linked to Poisson point process.

    1

  • 2 LECTURE 1. BASIC PROBABILITY REFRESHER

    We denote X ∼ P(λ) and say that X is distributed according to the Poisson distribution with parameter λ. The c.d.f. of X is

    −1 0 1 2 3 4 5 6 0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    The c.d.f. of a discrete random variable is a step function.

    Continuous variable. X is a continuous variable if its distribution admits a density with respect to the Lebesgue measure on R. In this case the c.d.f. F of X is differentiable almost everywhere on R and its derivative

    f(x) = F ′(x)

    is called probability density of X. Note that f(x) ≥ 0 for all x ∈ R and∫ ∞ −∞

    f(x)dx = 1.

    Example 1.1

    a) Normal distribution N(µ, σ2) with density

    f(x) = 1√ 2πσ

    e− (x−µ)2

    2σ2 , x ∈ R,

    where µ ∈ R and σ > 0. If µ = 0, σ2 = 1, the distribution N(0, 1) is referred to as standard normal distribution.

    b) Uniform distribution U [0, θ] with density

    f(x) = 1

    θ I{x ∈ [0, θ]}, x ∈ R,

    where η > 0 and I{·} stands for the indicator function: for set A

    I{x ∈ A} = {

    1 if x ∈ A, 0 otherwise.

  • 1.1. CHARACTERIZATIONS OF RANDOM VARIABLES 3

    c) Exponential distribution E(λ) with density

    f(x) = λe−λx for x ≥ 0, and f(x) = 0 for x < 0,

    where λ > 0. The c.d.f. of E(λ) is given by

    F (x) = (1− e−λx) for x ≥ 0, and F (x) = 0 for x < 0.

    Discrete distributions are entirely determined by the probabilities {P (X = k)}k, continuous distribution are defined with their density f(·). However, some scalar functionals of the distri- bution may be useful to characterize the behavior of corresponding random variables. Examples of such functionals are the moments and quantiles.

    1.1.1 Moments of random variables

    Mean (expectation) of a random variable X:

    µ = E(X) =

    ∫ ∞ −∞

    xdF (x) =

    { ∑ i iP (X = i) in the discrete case,∫ xf(x)dx in the continuous case.

    Moment of order k (k = 1, 2, ...) :

    µk = E(X k) =

    ∫ ∞ −∞

    xkdF (x),

    same as central moment of order k:

    µ′k = E((X − µ)k) = ∫ ∞ −∞

    (x− µ)kdF (x).

    A special case is the variance σ2(= µ′2 – the central moment of order 2):

    σ2 = Var(X) = E((X − E(X))2) = E(X2)− (E(X))2.

    The squared root of the variance is called standard deviation (s.d. or st.d.) of X: σ =√ Var(X).

    Absolute moment µ̄k of order k

    µ̄k = E(|X|k)

    same as central absolute moment of order k:

    µ̄′k = E(|X − µ|k).

    Clearly, these definitions assume the existence of the respective integrals, and not all distributions possess moments.

    Example 1.2

  • 4 LECTURE 1. BASIC PROBABILITY REFRESHER

    Let X be a random variable with probability density

    f(x) = c

    1 + |x| log2 |x| , x ∈ R,

    where the constant c > 0 is such that ∫ f = 1. Then E(|X|a) =∞ for all a > 0.

    The mean is used to characterize the location (position) of a random variable. The variance characterizes the scale (dispersion) of the distribution.

    The normal distribution N(µ, σ2) with mean µ and variance σ2:

    −10 −8 −6 −4 −2 0 2 4 6 8 10 0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    

    

    

    “large” σ (large dispertion), “small” σ (little dispersion)

    Let F be the c.d.f. of the random variable X with mean µ and variance σ. By an affine transformation we obtain the variable X0 = (X − µ)/σ, such that E(X0) = 0, E(X20 ) = 1 (the standardized variable). If F0 is the c.d.f. of X0 then F (x) = F0(

    x−µ σ ). In the continuous

    case, the density of X satisfies

    f(x) = 1

    σ f0(

    x− µ σ

    ),

    where f0 is the density of X0. Note that it is not necessary to assume that the mean and the variance exist to define the

    standardized distribution F0 and the representation F (x) = F0( x−µ σ ). Typically, this is done

    to underline that F depends on location parameter µ and scale σ. E.g., for the family of Cauchy densities parameterized with µ, σ, f(x) = 1

    πσ(1+[(x−µ)/σ]2) , the standardized density is

    f0(x) = 1

    π(1+x2) . Meanwhile, expectation and variance do not exist for the Cauchy distribution.

    An interesting problem of Calculus is related to the notion of moments µk: let F be a c.d.f. such that all its moments are finite. Given a sequence {µk}, k = 1, 2, ... of moments of F , is it possible to recover F? The general answer to this question is negative. Nevertheless, there exist particular cases where the recovery is possible, namely, under the hypothesis that

    lim sup k→∞

    µ̄ 1/k k

    k

  • 1.1. CHARACTERIZATIONS OF RANDOM VARIABLES 5

    1.1.2 Probability quantiles

    Let X be a random variable with continuous and strictly increasing c.d.f. F . The quantile of order p, 0 < p < 1, of the distribution F is the solution qp of the equation

    F (qp) = p.

    Observe that if F is strictly increasing and continuous, the solution exists and is unique, thus the quintile qp is well defined. If F has “flat zones” or is not continuous we can modify the definition, for instance, as follows:

    Definition 1.1 Let F be a c.d.f. The quintile qp of order p of F is the value

    qp = inf{q : F (q) ≥ p}.

    The median M of the c.d.f. F is the quintile of order 1/2,

    M = q1/2.

    Note that if F is continuous F (M) = 1/2.

    The quartiles are the quantiles q1/4 and q3/4 of order 1/4 and 3/4.

    The l% percentiles of F are the quantiles qp of order p = l/100, 0 < l < 100.

    We note that the median characterizes the location of the probability distribution, while the difference q3/4−q1/4 (referred to as interquartile interval) can be interpreted as a characteris- tics of scale. These quantities are analogues of the mean µ and standard deviation σ. However, unlike the mean and the standard deviation, the median and the interquartile interval are well defined for all probability distributions.

    1.1.3 Other characterizations

    The mode. For a discrete distribution F , we call the mode of F the value k∗ such that

    P (X = k∗) = max k

    P (X = k)

    In the continuous case, the mode x∗ is defined a local maximum of the density f :

    f(x∗) = max x

    f(x).

    A density f is said unimodal if x∗ is the unique local maximum of f (one can also speak of bi-modal or multi-modal densities). This characteristics is rather imprecise, because even when

  • 6 LECTURE 1. BASIC PROBABILITY REFRESHER

    the density has a unique global maximum, we will call it multimodal if it has other local maxima. The mode is a characteristics of location which can be of interest in the case of unimodal density.

    0 2 4 6 8 10 12 14 16 18 20 0

    0.05

    0.1

    0.15

    0.2

    0.25

    Mode

    Mediane

    Moyenne

    The mode, the mediane and the mean of a distribution

    Skewness and kurtosis

    Definition 1.2 The distribution of X (the c.d.f. F ) is said symmetric with respect to zero (or “simply” symmetric) if for all x ∈ R, F (x) = 1−F (−x) (f(x) = f(−x) in the continuous case).

    Definition 1.3 The distribution of X (the c.d.f. F ) is called symmetric with respect to µ ∈ R if

    F (x+ µ) = 1− F (µ− x)

    (f(x+ µ) = f(µ− x) in the continuous case).

    In other words, the c.d.f F (· − µ) is symmetric (with respect to zero).

    Exercise 1.1

    a) Show that if F is symmetric with respect to µ, and E(|X|)

  • 1.1. CHARACTERIZATIONS OF RANDOM VARIABLES 7

    Provide an example of asymmetric density with α = 0.

    Observe the role of σ in the definition of α: suppose, for istance, that the density f0(x) of X satisfies

    ∫ xf0(x)dx = 0 and

    ∫ x2f0(x)dx = 1 with α0 = µ

    ′ 30 =

    ∫ x3f0(x)dx. For σ > 0, µ ∈ R,

    the function

    f(x) = 1

    σ f0(

    x− µ σ

    ),

    is the density of the random variable σX + µ, and thus Var(σX + µ) = σ