Generalized Nonlinear Models in R

download Generalized Nonlinear Models in R

of 30

Embed Size (px)

Transcript of Generalized Nonlinear Models in R

  • Generalized Nonlinear Models in R

    Heather Turner1,2, David Firth2 and Ioannis Kosmidis3

    1 Independent consultant2 University of Warwick, UK

    3 UCL, UK

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 1 / 30

  • Generalized Linear ModelsA GLM is made up of a linear predictor

    = 0 + 1x1 + ...+ pxp

    and two functions

    I a link function that describes how the mean, E(Y ) = ,depends on the linear predictor

    g() =

    I a variance function that describes how the variance, V ar(Y )depends on the mean

    V ar(Y ) = V ()

    where the dispersion parameter is a constantTurner, Firth & Kosmidis GNM in R ERCIM 2013 2 / 30

  • Generalized Nonlinear Models

    A generalized nonlinear model (GNM) is the same as a GLMexcept that we have

    g() = (x; )

    where (x; ) is nonlinear in the parameters .

    Equivalently an extension of nonlinear least squares model, where thevariance of Y is allowed to depend on the mean.

    Using a nonlinear predictor can produce a more parsimonious andinterpretable model.

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 3 / 30

  • Example: Mental Health StatusA study of 1660 children from Manhattan recorded their mentalimpairment and parents socioeconomic status (Agresti, 2002)

    MHSSE

    SF

    ED

    CB

    Awell mild moderate impaired

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 4 / 30

  • Independence

    A simple analysis of these data might be to test for independence ofMHS and SES using a chi-squared test.

    This is equivalent to testing the goodness-of-fit of the independencemodel

    log(rc) = r + c

    Such a test compares the independence model to the saturated model

    log(rc) = r + c + rc

    which may be over-complex.

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 5 / 30

  • Row-column Association

    One intermediate model is the Row-Column association model:

    log(rc) = r + c + rc

    (Goodman, 1979), an example of a multiplicative interaction model.

    For the Mental Health data:

    ## Analysis of Deviance Table

    ##

    ## Model 1: Freq ~ SES + MHS

    ## Model 2: Freq ~ SES + MHS + Mult(SES , MHS)

    ## Model 3: Freq ~ SES + MHS + SES:MHS

    ## Resid. Df Resid. Dev Df Deviance Pr(>Chi)

    ## 1 15 47.4

    ## 2 8 3.6 7 43.8 2.3e-07

    ## 3 0 0.0 8 3.6 0.89

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 6 / 30

  • Parameterisation

    The independence model was defined earlier in an over-parameterisedform:

    log(rc) = r + c

    = (r + 1) + (c 1)= r +

    c

    Identifiability constraints may be imposed

    I to fix a one-to-one mapping between parameter values anddistributions

    I to enable interpretation of parameters

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 7 / 30

  • Standard Implementation

    The standard approach of all major statistical software packages is toapply the identifiability constraints in the construction of the model

    g() =X

    so that rank(X) is equal to the number of parameters p.

    Then the inverse in the score equations of the IWLS algorithm

    (r+1) =(XTW (r)X

    )1XTW (r)z(r)

    exists.

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 8 / 30

  • Alternative Implementation

    The gnm package for R works with over-parameterised models, whererank(X) < p, and uses the generalised inverse in the IWLS updates:

    (r+1) =(XTW (r)X

    )XTW (r)z(r)

    This approach is more useful for GNMs, where it is much harder todefine standard rules for specifying identifiability constraints.

    Rather, identifiability constraints can be applied post-fitting forinference and interpretation.

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 9 / 30

  • Estimation of GNMs

    GNMs present further technical difficulties vs. GLMs

    I automatic generation of starting values is hard

    I the likelihood may have multiple optima

    The default approach of the gnm function in package gnm is to:

    I generate starting values randomly for nonlinear parameters andusing a GLM fit for linear parameters

    I use one-parameter-at-a-time Newton method to updatenonlinear parameters

    I use the generalized IWLS to update all parameters

    Consequently, the parameterisation returned is random.

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 10 / 30

  • Parameterisation of RC ModelThe RC model is invariant to changes in scale or location of theinteraction parameters:

    log(rc) = r + c + rc

    = r + c + (2r)(0.5c)

    = r + (c c) + (r + 1)(c)One way to constrain these parameters is as follows

    r =r

    r wrrr wr

    r wr

    (r

    r wrrr wr

    )where wr is the row probability, say, so that

    r

    wrr = 0

    r

    wr(r)

    2 = 1

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 11 / 30

  • Row and Column Scores

    These scores and their standard errors can be obtained via thegetContrasts function in the gnm package

    ## Estimate Std. Error

    ## Mult(., MHS).SESA 1.11 0.30

    ## Mult(., MHS).SESB 1.12 0.31

    ## Mult(., MHS).SESC 0.37 0.32

    ## Mult(., MHS).SESD -0.03 0.27

    ## Mult(., MHS).SESE -1.01 0.31

    ## Mult(., MHS).SESF -1.82 0.28

    ## Estimate Std. Error

    ## Mult(SES , .).MHSwell 1.68 0.19

    ## Mult(SES , .).MHSmild 0.14 0.20

    ## Mult(SES , .).MHSmoderate -0.14 0.28

    ## Mult(SES , .).MHSimpaired -1.41 0.17

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 12 / 30

  • Stereotype Model

    The stereotype model (Anderson, 1984) is suitable for orderedcategorical data. It is a special case of the multinomial logistic model:

    pr(yi = c|xi) = exp(0c + Tc xi)

    r exp(0r + Tr xi)

    in which only the scale of the relationship with the covariates changesbetween categories:

    pr(yi = c|xi) = exp(0c + cTxi)

    r exp(0r + rTxi)

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 13 / 30

  • Poisson Trick

    The stereotype model can be fitted as a GNM by re-expressing thecategorical data as category counts Yi = (Yi1, . . . , Yik).

    Assuming a Poisson distribution for Yic, the joint distribution of Yi isMultinomial(Ni, pi1, . . . , pik) conditional on the total count Ni.

    The expected counts are then ic = Nipic and the parameters of thesterotype model can be estimated through fitting

    log ic = log(Ni) + log(pic)

    = i + 0c + cr

    rxir

    where the nuisance parameters i ensure that the multinomialdenominators are reproduced exactly, as required.

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 14 / 30

  • Augmented Least SquaresA disadvantage of using the Poisson trick is that the number ofnuisance parameters can be large, making computation slow.

    The algorithm can be adapted using augmented least squares.

    For an ordinary least squares model,

    [(y|X)T (y|X)]1 = ( yTy yTX

    XTy XTX

    )1=

    (A11 A12A21 A22

    )where A11,A12 and A22 are functions of y

    Ty, XTy and XTX.

    Then it can be shown that

    = (XTX)1XTy = A21A11

    requiring only the first row (column) of the inverse to be found.Turner, Firth & Kosmidis GNM in R ERCIM 2013 15 / 30

  • Application to Nuisance Parameters I

    The same approach can be applied to the IWLS algorithm, letting

    X =W12 (z|X)

    Now letX = (U |V )

    where V is the part of the design matrix corresponding to thenuisance factor.

    U is an nk p matrix where n is the number of nuisance parametersand k is the number of categories and p is the number of modelparameters, typically with n >> p.

    V is an nkn matrix of dummy variables identifying each individual.Turner, Firth & Kosmidis GNM in R ERCIM 2013 16 / 30

  • Application to Nuisance Parameters II

    Then

    (XTX) =

    (UTU UTVV TU V TV

    )=

    (B11 B12B21 B22

    )

    Again, only the first row (column) of this generalised inverse isrequired to estimate , so we are only interested in B11 and B12.

    B11 = (UTU UTV (V TV )1V TU)

    B12 = (V TV )1V TUB11

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 17 / 30

  • Elimination of the Nuisance Factor

    UTU is p p, therefore not expensive to compute.V TV and V TU can be computed without constructing the largenk n matrix V , due to the stucture of VI V TV is diagonal and the non-zero elements can be computed

    directly

    I V TU is equivalent to aggregating the rows of U by levels of thenuisance factor

    Thus we only need to construct the U matrix, saving memory andreducing the computational burden.

    This approach is invoked using the eliminate argument to gnm.

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 18 / 30

  • Example: Back Pain Data

    For 101 patients, 3 prognostic variables were recorded at baseline,then after 3 weeks the level of back pain was recorded (Anderson,1984)

    These data can be converted to counts using theexpandCategorical function, giving for the first record:

    ## x1 x2 x3 pain count id

    ## 1 1 1 1 worse 0 1

    ## 1.1 1 1 1 same 1 1

    ## 1.2 1 1 1 slight.improvement 0 1

    ## 1.3 1 1 1 moderate.improvement 0 1

    ## 1.4 1 1 1 marked.improvement 0 1

    ## 1.5 1 1 1 complete.relief 0 1

    Turner, Firth & Kosmidis GNM in R ERCIM 2013 19 / 30

  • Back Pain ModelThe expanded data set has only 606 records and the total number ofparameters is only 115 (9 nonlinear). So the model is quick to fit:

    system.time({m

  • Rasch Models

    Rasch models are used in Item Response Theory to model the binaryresponses of subjects over a set of items.

    The simplest one parameter logistic (1PL) model has the form