for Chapter Generalized Linear Models (GLMs) .The Kyphosis data consist of measurements on 81...

download for Chapter Generalized Linear Models (GLMs) .The Kyphosis data consist of measurements on 81 children

of 53

  • date post

    27-Jul-2018
  • Category

    Documents

  • view

    214
  • download

    0

Embed Size (px)

Transcript of for Chapter Generalized Linear Models (GLMs) .The Kyphosis data consist of measurements on 81...

  • Statistics for Applications

    Chapter 10: Generalized Linear Models (GLMs)

    1/52

  • Linear model

    A linear model assumes

    Y |X N ((X), 2I),

    And IE(Y |X) = (X) = X,

    2/52

  • Components of a linear model

    The two components (that we are going to relax) are

    1. Random component: the response variable Y |X is continuous and normally distributed with mean = (X) = IE(Y |X).

    2. Link: between the random and covariates (X(1),X(2)X = , ,X(p)): (X) = X.

    3/52

  • Generalization

    A generalized linear model (GLM) generalizes normal linear regression models in the following directions.

    1. Random component:

    Y some exponential family distribution

    2. Link: between the random and covariates:

    g (X) = X

    where g called link function and = IE(Y |X).

    4/52

  • Example 1: Disease Occuring Rate

    In the early stages of a disease epidemic, the rate at which new cases occur can often increase exponentially through time. Hence, if i is the expected number of new cases on day ti, a model of the form

    i = exp(ti)

    seems appropriate.

    Such a model can be turned into GLM form, by using a log link so that

    log(i) = log() + ti = 0 + 1ti.

    Since this is a count, the Poisson distribution (with expected value i) is probably a reasonable distribution to try.

    5/52

  • Example 2: Prey Capture Rate(1)

    The rate of capture of preys, yi, by a hunting animal, tends to increase with increasing density of prey, xi, but to eventually level off, when the predator is catching as much as it can cope with. A suitable model for this situation might be

    xi i = ,

    h+ xi

    where represents the maximum capture rate, and h represents the prey density at which the capture rate is half the maximum rate.

    6/52

  • Example 2: Prey Capture Rate (2)

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.0 0.2 0.4 0.6 0.8 1.0

    7/52

  • Example 2: Prey Capture Rate (3)

    Obviously this model is non-linear in its parameters, but, by using a reciprocal link, the right-hand side can be made linear in the parameters,

    1 1 h 1 1 g(i) = = + = 0 + 1 .

    i xi xi

    The standard deviation of capture rate might be approximately proportional to the mean rate, suggesting the use of a Gamma distribution for the response.

    8/52

  • Example 3: Kyphosis Data

    The Kyphosis data consist of measurements on 81 children following corrective spinal surgery. The binary response variable, Kyphosis, indicates the presence or absence of a postoperative deforming. The three covariates are, Age of the child in month, Number of the vertebrae involved in the operation, and the Start of the range of the vertebrae involved.

    The response variable is binary so there is no choice: Y |X is Bernoulli with expected value (X) (0, 1).

    We cannot write (X) = X

    because the right-hand side ranges through IR.

    We need an invertible function f such that f(X) (0, 1)

    9/52

  • GLM: motivation

    clearly, normal LM is not appropriate for these examples;

    need a more general regression framework to account for various types of response data

    Exponential family distributions

    develop methods for model fitting and inferences in this framework

    Maximum Likelihood estimation.

    10/52

  • Exponential Family

    A family of distribution {P : }, IRk is said to be a k-parameter exponential family on IRq, if there exist real valued functions:

    1, 2, , k and B of ,

    T1, T2, , Tk, and h of x IRq such that the density function (pmf or pdf) of P can be written as

    k

    p(x) = exp[ i()Ti(x)B()]h(x)L

    i=1

    11/52

  • Normal distribution example Consider X N (, 2), = (, 2). The density is

    ( 2 ) 1 1 p(x) = exp x x 2 ,

    2 22 22 2

    which forms a two-parameter exponential family with

    1 21 = , 2 = , T1(x) = x, T2(x) = x ,2 22

    2 B() = + log( 2), h(x) = 1.

    22

    When 2 is known, it becomes a one-parameter exponential family on IR:

    2

    x 2 22 e

    = , T (x) = x, B() = , h(x) = . 2 22 2

    12/52

  • Examples of discrete distributions

    The following distributions form discrete exponential families of distributions with pmf

    Bernoulli(p): p x(1 p)1x , x {0, 1}

    x Poisson(): e , x = 0, 1, . . . .

    x!

    13/52

  • Examples of Continuous distributions

    The following distributions form continuous exponential families of distributions with pdf:

    1 x a1 Gamma(a, b): x e b ;

    (a)ba

    above: a: shape parameter, b: scale parameter reparametrize: = ab: mean parameter

    ( )a ax 1 a a1 x e .

    (a)

    1 /x

    Inverse Gamma(, ): x e . ()

    2 2(x)2

    2 x2 Inverse Gaussian(, 2): e . 2x3

    Others: Chi-square, Beta, Binomial, Negative binomial distributions.

    14/52

  • Components of GLM

    1. Random component:

    Y some exponential family distribution

    2. Link: between the random and covariates:

    g (X) = X

    where g called link function and (X) = IE(Y |X).

    15/52

  • One-parameter canonical exponential family

    Canonical exponential family for k = 1, y IR (y b() )

    f(y) = exp + c(y, )

    for some known functions b() and c(, ) .

    If is known, this is a one-parameter exponential family with being the canonical parameter .

    If is unknown, this may/may not be a two-parameter exponential family. is called dispersion parameter.

    In this class, we always assume that is known.

    16/52

  • ( )

    Normal distribution example

    Consider the following Normal density function with known variance 2 ,

    1 (y)2

    2f(y) = e 2

    2

    1 2 2y 1 y2= exp + log(22) ,2 2 2

    2 Therefore = , = 2, , b() = 2 , and

    1 y2 c(y, ) = ( + log(2)).

    2

    17/52

  • Other distributions

    Table 1: Exponential Family

    Normal Poisson Bernoulli

    Notation N (, 2) P() B(p) Range of y (,) [0,) {0, 1}

    1 12

    2 b() 2 e log(1 + e)

    c(y, ) 1 (y2 + log(2)) log y! 12

    18/52

  • Likelihood

    Let () = log f(Y ) denote the log-likelihood function. The mean IE(Y ) and the variance var(Y ) can be derived from the following identities

    First identity

    IE( ) = 0,

    Second identity

    2 IE( ) + IE( )2 = 0.

    2

    Obtained from f(y)dy 1 .

    19/52

  • Expected value

    Note that Y b(

    () = + c(Y ;),

    Therefore Y b ()

    =

    It yields IE(Y ) b ())

    0 = IE( ) = ,

    which leads to IE(Y ) = = b ().

    20/52

  • Variance

    On the other hand we have we have

    2 b () (Y b ())2 + ( )2 = +

    2

    and from the previous result,

    Y b () Y IE(Y ) =

    Together, with the second identity, this yields

    b () var(Y )0 = + ,

    2

    which leads to var(Y ) = V (Y ) = b ().

    21/52

  • Example: Poisson distribution

    Example: Consider a Poisson likelihood,

    y y log log(y!)f(y) = e = e ,

    y!

    Thus, = log , b() = , c(y, ) = log(y!),

    = 1,

    = e ,

    b() = e ,

    b () = e = ,

    22/52

  • Link function

    is the parameter of interest, and needs to appear somehow in the likelihood function to use maximum likelihood.

    A link function g relates the linear predictor X to the mean parameter ,

    X = g().

    g is required to be monotone increasing and differentiable

    = g 1(X).

    23/52

  • Examples of link functions

    For LM, g() = identity. Poisson data. Suppose Y |X Poisson((X)).

    (X) > 0; log((X)) = X; In general, a link function for the count data should map

    (0,+) to IR. The log link is a natural one.

    Bernoulli/Binomial data. 0 < < 1; g should map (0, 1) to IR: 3 choices:

    ( )

    (X) X1. logit: log = ;

    1(X)

    2. probit: 1((X)) = X where () is the normal cdf; X3. complementary log-log: log( log(1 (X))) =

    The logit link is the natural choice.

    24/52

  • Examples of link functions for Bernoulli response (1)

    -5 -4 -3 -2 -1 0 1 2 3 4 5 0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    xe in blue: f1(x) =

    1 + ex

    in red: f2(x) = (x) (Gaussian CDF) 25/52

  • Examples of link functions for Bernoulli response (2) 5

    4

    3

    2 in blue:

    f1 1 g1(x) = 1 (x) = x

    log (logit link) 0 1 x

    in red: -1

    f1 1(x)g2(x) = 2 (x) = -2 (probit link)

    -3

    -4

    -5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    26/52

  • Canonical Link

    The function g that links the mean to the canonical parameter is called Canonical Link:

    g() =

    Since = b (), the canonical link is given by

    g() = (b )1() .

    If > 0, the canonical link function is strictly increasing. Why?

    27/52

  • Example: the Bernoulli distribution

    We can check that

    b() = log(1 + e )

    Hence we solve ( )

    exp() b () = = = log

    1 + exp() 1

    The canonical link for the Bernoulli distribution is the logit link.

    28/52

  • Other examples

    b() g()

    Normal Poisson Bernoulli

    Gamma

    2/2 exp()

    log(1 + e)

    log()

    log