Families of Distributions - Washington University in St. Louis · Families of Distributions 17 3...

Click here to load reader

  • date post

    06-Jun-2020
  • Category

    Documents

  • view

    31
  • download

    1

Embed Size (px)

Transcript of Families of Distributions - Washington University in St. Louis · Families of Distributions 17 3...

  • 508-B (Statistics Camp, Wash U, Summer 2016)

    Families of DistributionsAuthor: Andrés Hincapié and Linyi Cao

    This Version: July 21, 2016

  • Families of Distributions 3

    Suppose our data X follows a distribution X ∼ P

    We could assume for instance P ≡ N(µ, σ2

    )

    Hence pdf is

    f (x) =1√

    2πσ2exp

    {− 1

    2σ2(x− µ)2

    }

    Distribution is indexed by a fixed number of parameters

    Such distribution is called parametric

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 4

    We may relax our functional assumptions to

    a. E [X ] = µ

    b. Distribution is symetric around µ

    c. cdf of X is smooth

    This more flexible distribution cannot be characterized by a fixed number of parameters.

    Such distribution is called nonparametric

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 5

    We will be dealing with parametric distributions

    Families are parametric distributions f (x; θ) indexed by a set of parameters θ with fixed number ofelements.

    Notation f (x; θ) helps keep track of the vector of parameters θ

    Vary certain characteristics while remaing with the same functional form

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 6

    1 Discrete Distributions

    R.V. X has a discrete distribution if its support is countable

    1.1 Discrete Uniform (1, N)

    P (X = x) = 1N , x = 1, 2, . . . N

    Mean: E [X ] = N+12

    Variance: V ar [X ] = (N+1)(N−1)12

    Needless to say, mean does not necesarily belong to the support of X

    Example: Throwing a die- each number has p = 1/6

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 7

    1.2 Binomial (p)

    Based on the idea of a Bernoulli trial : an experiment with only two possible outcomes happening withprob. p and 1− p

    If X is Bernoulli, EX =, and V arX = p (1− p)

    A binomial r.v. emerges from the number of successes in a sequence of n independent Bernoulli trials

    Let n = 3. Sequence A1 = 0, A2 = 0, A3 = 1 (which implies Y = 1) has prob (1− p) (1− p) p

    Sequence A1 = 0, A2 = 1, A3 = 0 (which also implies Y = 1) has prob (1− p) p (1− p)

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 8

    Any particular sequence of with y successes has probability py (1− p)n−y

    In how many different sequences can Y = y happen? Using our counting knowledge, how manyunordered subsets of y elements without replacement can we form from n labels?

    (ny

    )

    Therefore

    P (X = x) =

    (n

    y

    )py (1− p)n−y

    Example: Probability of obtaining at least one 6 in a sequence of 4 rolls of a fair die.

    EX = np, V arX = np (1− p), MX (t) = [pet + (1− p)]n

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 9

    1.3 Poisson (λ)

    Used f.i. in number of occurrences in a given period of time or waiting for occurrences

    P (X = x) =e−λλx

    x!

    EXAMPLE: operator receives 5calls/min. X =number of calls in a minute. X is Poisson. What is theprob. of receiving at least two calls in the next minute?

    It can be shown that

    P (X = x) =λ

    xP (X = x− 1)

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 10

    2 Continuous Distributions

    R.V. X has an uncountable support

    2.1 Uniform[a, b]

    fX (x) =

    {1/ (b− a) if x ∈ [a, b]

    0 otherwise

    EX = b+a2 , V arX =(b−a)212

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 11

    2.2 Gamma (α, β)First define the gamma function. For any positive integer α

    Γ (α) =

    ∫ ∞0

    tα−1e−tdt

    Satisfies

    Γ (α + 1) = αΓ (α) , for α > 0

    And

    Γ (n) = (n− 1)!

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 12

    A random variable has a gamma (α, β) distribution if its pdf is of the form:

    fX (x) =1

    Γ (α) βαxα−1e−x/β , 0 < x 0 , β > 0

    α is known as the shape parameters and β as the scale parameter (affects spread)

    EX = αβ

    V arX = αβ2

    MX (t) =(

    11−βt)α, t < 1/β

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 13

    Important special cases of the Gamma are

    Exponential (β)= Gamma (α = 1, β)

    Chi-squared with n degrees of freedom = Gamma(α = n/2, β = 2)

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 14

    2.3 Normal(µ, σ2

    )(Also called Gaussian)

    Very tractable analytically

    Familiar bell shape

    A large variety of distributions “approach” a Normal in large samples

    f (x) =1√

    2πσ2exp

    {− 1

    2σ2(x− µ)2

    }pdf does not have closed form antiderivative

    Φ (z) is notantion for the standard normal ≡ N (0, 1)

    If X ∼ N(µ, σ2

    ), Z = X−µσ ∼ N (0, 1)

    If X ∼ N(µ, σ2

    ), and Y = aX + b. How’s Y distributed?

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 15

    2.4 Lognormal

    X is a r.v. whose log is normaly distributed

    EX = exp{µ + σ2/2

    }

    V arX = exp{

    2(µ + σ2

    )}− exp

    {2µ + σ2

    }

    Applications... when variable of interest is skewed to the right: income as log normal (it cannot benegative) allows to exploit Normal tractability on the log(income)

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 16

    2.5 Beta (α, β)

    Continuous family on (0, 1)

    f (x) =1

    B (α, β)xα−1 (1− x)β−1 , 0 < x < 1 , α, β > 0

    B (α, β) is the beta function

    B (α, β) =

    ∫ 10

    xα−1 (1− x)β−1 dx

    which satisfies

    B (α, β) =Γ (α) Γ (β)

    Γ (α + β)

    EX = αα+β V arX =αβ

    (α+β)2(α+β+1)

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 17

    3 Exponential Families

    A family of pdfs or pmfs belongs to exponential family if it can be written as

    f (x; θ) = h (x) c (θ) exp

    {k∑i=1

    wi (θ) ti (x)

    }

    where h (x) ≥ 0 and t1 (x) , . . . , tk (x) are real-valued functions of x

    θ can be a vector or a scalar

    Exponential families:

    Discrete: binomial, Poisson

    Continuous: normal, gamma, and beta

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 18

    EXAMPLE: Binomial exponential family

    P (X = x) =

    (n

    y

    )py (1− p)n−y

    EXAMPLE: Gamma exponential family

    fX (x) =1

    Γ (α) βαxα−1e−x/β , 0 < x 0 , β > 0

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 19

    If X belongs to an exponential family

    E

    (k∑i=1

    ∂wi (θ)

    ∂θjti (X)

    )= −∂ log c (θ)

    ∂θj

    EXAMPLE: Binomial

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 20

    4 Location and Scale Families

    Horizontal shifts of the distribution; stretch or contract the distribution

    First, notice that for any pdf f (x) and arbitrary constants µ and σ > 0,

    g (x;µ, σ) =1

    σf

    (x− µσ

    )is a pdf

    Proof...

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 21

    4.1 Location FamiliesLet f (x) be any pdf. Then for any constant µ, the family of pdfs

    f (x− µ)

    is called a location family with location parameter µ

    At any x = µ + a, f (x− µ) = f (a)

    EXAMPLE: Normal with σ = 1

    4.2 Scale FamiliesLet f (x) be any pdf. Then for any σ > 0, the family of pdfs

    (1/σ) f (x/σ)

    indexed by the parameter σ is called a scale family with scale parameter σ

    EXAMPLE: Normal with µ = 0

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 22

    4.3 Scale-Location FamiliesLet f (x) be any pdf. Then any constant µ and any constant σ > 0, the family of pdfs

    f (x;µ, σ) =1

    σf

    (x− µσ

    )indexed by the parameters µ and σ is called a scale-location family

    Result: if X follows a location-scale distribution such that

    fX (x) =1

    σf

    (x− µσ

    )

    Hence the r.v. Z = X−µσ follows a location-scale distribution with location parameter 0 and scaleparameter 1

    fZ (z) = f (z)

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 23

    5 Expectations and Probabilities

    We can think of any probability as an expectation of some indicator function; like we do with theBernoulli distribution where

    E[X ] = Pr[X = 1]

    There are some useful inequalities that provide information regarding the relation between expecta-tions and probabilities

    5.1 Markov’s Inequality

    Let X be a r.v. and let g (x) be a nonnegative function. Then, for any r > 0

    P (g (X) ≥ r) ≤ Eg (x)r

    Proof...

    It is an upper bound for the probability that a non-negative function of a r.v. is greater than or equal tosome positive constant.508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 24

    5.2 Chebyshev Inequality

    Let X be a r.v., any constant c, and any constant d > 0. Then

    Pr (|X − c| ≥ d) ≤ E[(X − c)2]/d2

    EXAMPLE:

    Pr[(X − EX)2 ≥ d2) ≤ V arX/d2

    It really follows from Markov’s Inequality by considering g (X) = (X − EX)2

    Gives a universal bound on deviation |X − µ| in terms of σ

    Pr[|X − µ| ≥ tσ) ≤ E[(X − µ)2]/ (tσ)2

    = V arX/ (tσ)2

    = 1/t2

    508-B (Statistics Camp, Wash U, Summer 2016)

  • Families of Distributions 25

    5.3 Stein’s Lemma

    Let X ∼ N(µ, σ2

    )and let g be a differentiable function satisfying E[g′ (X)]

  • Families of Distributions 26

    5.4 Jensen’s Inequality

    Recall a function h(X) is concave (convex) if for all x, y and 0 < λ < 1

    h (λx + (1− λ) y) ≥ (≤)λh (x) + (1− λ)h (y)

    Let X be a r.v. and h be a concave(convex) function of X. Then,

    E[h(X)] ≤ (≥) h(E[X ])

    Related to preference over risk.

    508-B (Statistics Camp, Wash U, Summer 2016)