GENERALIZED LINEAR MODELS- NEGATIVE BINOMIAL GLM and its application in R Malcolm Cameron Xiaoxu...

GENERALIZED LINEAR MODELS- NEGATIVE BINOMIAL GLM and its application in R

Malcolm Cameron Xiaoxu Tan

1. Table of Contents • Poisson GLM• Problems with Poisson• Solutions to this Problem• Negative Binomial GLM• Data Analysis• Summary

2. Poisson

• Let Yi be the random variable for claim count in the ith class, i = 1,2 ..... N, The mean and the variance are both equal to λ

Poisson Continued• Mean and variance are both equal• Memoryless Property• Commonly used because of its convenience and

appropriateness.• Many Statistical Packages available • Poisson Distribution can be used for

• Queuing theory• Insurance claims• Any count data

3. Problems with Poisson

• Over dispersion and heterogeneity of the distribution of residuals.

• MLE procedure used to derive estimates and provide the standard errors of those estimates make strong assumptions that every subject within a covariate group has the same underlying rate of outcome (homogeneity).

• The model assumes that the variability of counts within a covariate group is equal to the mean

• So if the variance is greater than the mean, this will lead to underestimated standard errors and overestimated significance of regression parameters

4. Solutions to This Problem

• But don’t worry!

(1) Fit a Poisson quasi likelihood.

(2) Fit Negative Binomial GLM.

We will be focusing on the second method

5. Negative Binomial

• The pmf is

k { 0, 1, 2, 3, … } — number of successes∈

Negative Binomial Continued

• Under the Poisson the mean, λi, is assumed to be constant within classes. But, if we define a specific distribution for λi, heterogeneity within classes can be used.

• One method is to assume λi to be Gamma with E(λi)= µi and Var(λi)=µi 2 / vi

• And Yi | λi to be the Poisson distribution with conditional mean E(Yi | λi )= λi

Negative Binomial Continued• It follows that the marginal distribution of Yi follows a Negative

Binomial distribution with PDF

iiiiiii dfyYPyYP )(|

•Where the mean is E(Yi)= µi and Var(Yi)= µi + µi2 vi

MLE for NB GLM• Different parameterization can result in different types of negative

binomial distributions. • For Example, by letting vi = a-1 , Y follows a NB with E(Yi)= µi , and

Var(Yi) = µi(1+a µi) where a denotes the dispersion parameter.

• Note: If a=0, there would be no over dispersion.• The log likelihood for this example would be

)1log()()log()!log()log()1log(),( 11

iiiiii

aayayyayarali

MLE for NB GLM Continued• And the Maximum likelihood estimates are obtained by solving the

following equations

j,...,2,1,0

)(),(~)1(

)()1log(

),(~)2(

Negative Binomial Continued• The MLE may be solved simultaneously, and the

procedure involves sequential iterations. • In (1), by using the initial value a, a(0), l(β,a) is maximized

w.r.t β, producing β(1).

• The First equation is equivalent to the weighted least squares, so with slight adjustments, the MLE can be found using Iterated Weighted Least Squares (IWLS) regression, similar to the Poisson.

• In (2), we treat β as a constant to solve for a(1) . This can be done using the Newton-Raphson algorithm

• By cycling between these two processes of updating our variables, the MLE for β and a will be obtained.

6 .Data Analysis

• Find data set with over dispersion.

• Do analysis using Poisson and NegBinomial

• Compare the models

Example: Students Attendance • School administrators study the attendance behavior of

high school juniors at two schools.• Predictors: • The type of program in which the student is enrolled • The students grades in a standardized math test

The Data• nb_data---The file of attendance data on 314 high school

juniors from two urban high schools • Daysabs---The response variable of interest is days

absent• Math ----The variable gives the standardized math score

for each student. • Prog --- is a three-level nominal variable indicating the

type of instructional program in which the student is enrolled.

Plot the data!

6.1 fit the Poisson model• Poisson1 <- glm(daysabs ~ math + prog, family = "poisson", data = students)• summary(Poisson1 )• Call:• glm(formula = daysabs ~ math + prog, family = "poisson", data = students)• Coefficients:• Estimate Std. Error z value Pr(>|z|) • (Intercept) 2.651974 0.060736 43.664 < 2e-16 ***• math -0.006808 0.000931 -7.313 2.62e-13 ***• progAcademic -0.439897 0.056672 -7.762 8.35e-15 ***• progVocational -1.281364 0.077886 -16.452 < 2e-16 ***• Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 • (Dispersion parameter for poisson family taken to be 1)• Null deviance: 2217.7 on 313 degrees of freedom• Residual deviance: 1774.0 on 310 degrees of freedom• AIC: 2665.3• Number of Fisher Scoring iterations: 5

• because the mean value of daysabs appears to vary by progress.

with(students, tapply (daysabs, prog, function(x) {

sprintf("Mean (Var) = %1.2f (%1.2f)", mean(x), var(x)) • General Academic Vocational

"M (SD) = 10.65 (8.20)" "M (SD) = 6.93 (7.45)" "M (SD) = 2.67 (3.73)"

Poisson regression has a very strong assumption, that is the conditional variance equals conditional mean. But The variance is much greater than the mean,

Plot the data

So we need to find a new model…

• Negative binomial regression can be used ,when the conditional variance exceeds the conditional mean.

6.2 fit the negative-binomial model• > NB1=glm.nb(daysabs ~ math + prog, data = students)• > summary(NB1)• Call:• glm.nb(formula = daysabs ~ math + prog, data = students, init.theta = 1.032713156, • link = log)• Deviance Residuals: • Min 1Q Median 3Q Max • -2.1547 -1.0192 -0.3694 0.2285 2.5273 • Coefficients:• Estimate Std. Error z value Pr(>|z|) • (Intercept) 2.615265 0.197460 13.245 < 2e-16 ***• math -0.005993 0.002505 -2.392 0.0167 * • progAcademic -0.440760 0.182610 -2.414 0.0158 * • progVocational -1.278651 0.200720 -6.370 1.89e-10 ***• ---• Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 • (Dispersion parameter for Negative Binomial(1.0327) family taken to be 1)• Null deviance: 427.54 on 313 degrees of freedom• Residual deviance: 358.52 on 310 degrees of freedom• AIC: 1741.3• Number of Fisher Scoring iterations: 1

Plot the data !

7. Check model assumptions• We use the likelihood ratio test

• Code:• Poisson1 <- glm (daysabs ~ math + prog, family = "poisson", data = students)• > pchisq(2 * ( logLik(poisson1) - logLik(NB1)), df = 1, lower.tail = FALSE)• [1] 2.157298e-203

• This strongly suggests the negative binomial model is more appropriate than the Poisson model !

8. Goodness of fit *for poissonresids1<-residuals(poisson1, type="pearson")

sum(resids1^2)

[1] 2045.656

1-pchisq(2045.656,310)

*for negative binomialresids2<-residuals(NB1, type="pearson")

sum(resids2^2)

[1] 339.8771

> 1-pchisq(339.8771,310)

[1] 0.1170337

9. AIC –which model is better?

> AIC(Poisson1)

[1] 2665.285

> AIC(NB1)

[1] 1741.258

For negative binomial, it has Much smaller AIC!

Thank you !

GENERALIZED LINEAR MODELS- NEGATIVE BINOMIAL GLM and its application in R Malcolm Cameron Xiaoxu...

Documents

Transcript of GENERALIZED LINEAR MODELS- NEGATIVE BINOMIAL GLM and its application in R Malcolm Cameron Xiaoxu...

Algebraic Topology Lectures by: Professor Cameron Gordon · 2019. 12. 11. · In these terms, algebraic topology is somehow a way of translating this into an algebraic question. More

GLM I: Introduction to Generalized Linear Models

Agostic Interactions - Princeton University · Agostic Interactions Reviews: Brookhart, M.; Green, M. L. H. J. Organometall. Chem. ... - Coined into chemistry as a term by Malcolm

Lecture 14: GLM Estimation and Logistic Regressionpeople.musc.edu/~bandyopd/bmtry711.11/lecture_14.pdfLecture 14: GLM Estimation and Logistic Regression – p. 13/6 2 • The row margins

Introduction to Engineering Materials ENGR2000 …engineering.armstrong.edu/cameron/ENGR2000_electricalproperties.pdfChapter 18: Electrical Properties Dr.Coates. 18.2 Ohm’s Law is

GLM for fMRI Emily Falk, Ph.D. University of Pennsylvania Thanks to Thad Polk and Elliot Berkman 1.

Econ 488 Lecture 5 – Hypothesis Testing Cameron Kaplan.

Introduction to General and Generalized Linear Modelshmad/GLM/Slides_2012/week03/lect03.pdf · 2012-02-13 · Introduction to General and Generalized Linear Models The Likelihood

Peter J. Cameron University of St Andrews Algebra ...pjc/talks/acda13/hco1.pdf · (model-completeness), Ramsey theory, and topological dynamics (extremely amenable groups). Example

Chemistry of C-C π-bonds Lectures 5-8: Aromatic …€œOrganic Chemistry”, Clayden, Greeves, Wothers and Warren, OUP, 2000. Chapter 22 [2]. “Aromatic Chemistry” by Malcolm

EDBI C 2010 10 · 2010. 11. 4. · 3060306 welser profile ag 3060325 basf plant science gmbh commonwealth scientific and industrial research organisation 3060411 parrish, malcolm

Επαναληπτικό μάθημα GLM › modules › document › file.php › MATH532 › Lab… · Επαναληπτικό μάθημα glm . glm: Πιθανοφάνεια,

Symmetry in mathematics and mathematics of symmetrypjc/slides/beamer/symmetry.pdf · Symmetry in mathematics and mathematics of symmetry Peter J. Cameron p.j.cameron@qmul.ac.uk ...

Plots for Generalized Additive Modelsparker.ad.siu.edu/Olive/ppgam.pdfIn the GLM and 1D regression literature, AP is replaced by SP and EAP by ESP, but the 1D regression models are

β2-adrenergic receptor activation on donor cells …...1 β2-adrenergic receptor activation on donor cells ameliorates acute GvHD Authors: Hemn Mohammadpour1, Joseph L. Sarow1, Cameron

Linear Models - GitHub · Generalized linear models - the glm() function I Some types of observations can never be transformed into normality I Example: binary data; ones and zeroes.

GLM 80 & R 60 Professional Telemetrul laser GLM 80 ... · GLM 80 & R 60 Professional. 3. Ecran mare, iluminat; afisajul se roteste. in functie de pozitia telemetrului – citire foarte

GLM 120 C - Contorion...3 | 1 609 92A 4F4 | (22.10.2018) Bosch Power Tools Language Tool Settings Bluetooth Settings CAL 23.198m CAL 05.05.2017 7:26:12 30ë 9.273 m 8.179 m 18.558

Special values of Riemann's zeta functionfrancc/files/zeta_talk.pdf · Special values of Riemann’s zeta function Cameron Franc UC Santa Cruz March 6, 2013 Cameron Franc Special

"Shutterless Vector-Scanned Data Collection Methods for the Pilatus 6MF" Malcolm Capel NECAT Cornell University.