GLM I: Introduction to Generalized Linear Models

34
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Vice President and Actuary,

description

GLM I: Introduction to Generalized Linear Models. By Curtis Gary Dean Vice President and Actuary,. Classical Multiple Linear Regression. Y i = a 0 + a 1 X i 1 + a 2 X i 2 …+ a m X im + e i Y i are the response variables and X ij are predictors - PowerPoint PPT Presentation

Transcript of GLM I: Introduction to Generalized Linear Models

Page 1: GLM I: Introduction to Generalized Linear Models

1

GLM I: Introduction to Generalized Linear Models

GLM I: Introduction to Generalized Linear Models

By Curtis Gary DeanVice President and Actuary,

By Curtis Gary DeanVice President and Actuary,

Page 2: GLM I: Introduction to Generalized Linear Models

2

Classical Multiple Linear RegressionClassical Multiple Linear Regression

Yi = a0 + a1Xi1 + a2Xi2 …+ amXim + ei

Yi are the response variables and Xij are predictors

ei is random error term: Normally distributed

E[ei] = 0 and Var(ei) = σ2 is constant

Yi = a0 + a1Xi1 + a2Xi2 …+ amXim + ei

Yi are the response variables and Xij are predictors

ei is random error term: Normally distributed

E[ei] = 0 and Var(ei) = σ2 is constant

Page 3: GLM I: Introduction to Generalized Linear Models

3

Classical Multiple Linear RegressionClassical Multiple Linear Regression

μi = E[Yi] = a0 + a1Xi1 + …+ amXim

Yi is Normally distributed random variable with variance σ2

Want to estimate μi = E[Yi] for each i

μi = E[Yi] = a0 + a1Xi1 + …+ amXim

Yi is Normally distributed random variable with variance σ2

Want to estimate μi = E[Yi] for each i

Page 4: GLM I: Introduction to Generalized Linear Models

4

Response Yi has Normal Distribution

-12 -7 -2 3 8 μ i

Page 5: GLM I: Introduction to Generalized Linear Models

5

Problems with Traditional ModelProblems with Traditional Model

Number of claims is discrete

Claim sizes are skewed to the right

Probability of an event is in [0,1]

Variance is not constant across data points i

Nonlinear relationship between X’s and Y’s

Number of claims is discrete

Claim sizes are skewed to the right

Probability of an event is in [0,1]

Variance is not constant across data points i

Nonlinear relationship between X’s and Y’s

Page 6: GLM I: Introduction to Generalized Linear Models

6

Generalized Linear Models - GLMsGeneralized Linear Models - GLMs

Fewer restrictions

Y can model number of claims, probability of renewing, loss severity, loss ratio, etc.

Large and small policies can be put into one model

Y can be nonlinear function of X’s

Classical linear regression model is a special case

Fewer restrictions

Y can model number of claims, probability of renewing, loss severity, loss ratio, etc.

Large and small policies can be put into one model

Y can be nonlinear function of X’s

Classical linear regression model is a special case

Page 7: GLM I: Introduction to Generalized Linear Models

7

Generalized Linear Models - GLMsGeneralized Linear Models - GLMs

g(μi )= a0 + a1Xi1 + …+ amXim

g( ) is the link function

E[Yi] = μi = g-1(a0 + a1Xi1 + …+ amXim)

Yi can be Normal, Poisson, Gamma, Binomial, Compound Poisson, …

Variance can be modeled

g(μi )= a0 + a1Xi1 + …+ amXim

g( ) is the link function

E[Yi] = μi = g-1(a0 + a1Xi1 + …+ amXim)

Yi can be Normal, Poisson, Gamma, Binomial, Compound Poisson, …

Variance can be modeled

Page 8: GLM I: Introduction to Generalized Linear Models

8

Exponential Family of DistributionsExponential Family of Distributions

assumption commona is/)(

)()(''][

)('][

,exp , ;

wa

abYVar

bYE

yca

byyf

Page 9: GLM I: Introduction to Generalized Linear Models

9

Normal Distribution in Exponential FamilyNormal Distribution in Exponential Family

22

2

2

2

2

22

2

2

2

2

2

2ln2

2/exp

2

2exp

2

1lnexp

2

)(exp

2

1),;(

yy

yy

yyf

Page 10: GLM I: Introduction to Generalized Linear Models

10

Some Math Rules: RefresherSome Math Rules: Refresher

yxyx

xxx

xrx

yxxy

xxx

ex

r

x

lnln)/ln()6(

ln)ln()/1ln()5(

ln)ln()4(

lnln)ln()3(

]ln[exp]exp[ln)2(

)exp()1(

1

Page 11: GLM I: Introduction to Generalized Linear Models

11

Poisson Distribution in Exponential FamilyPoisson Distribution in Exponential Family

)!ln(1

)(lnexp]Pr[

!lnexp]Pr[

!]Pr[

yy

yY

y

eyY

y

eyY

y

y

)!ln(1

)(lnexp]Pr[

!lnexp]Pr[

!]Pr[

yy

yY

y

eyY

y

eyY

y

y

Page 12: GLM I: Introduction to Generalized Linear Models

12

Poisson Distribution in Exponential FamilyPoisson Distribution in Exponential Family

ed

dabYVar

eed

dbYE

aeb

e

2

2

)()(''][

)('][

1)(and)(

ln

ed

dabYVar

eed

dbYE

aeb

e

2

2

)()(''][

)('][

1)(and)(

ln

Page 13: GLM I: Introduction to Generalized Linear Models

13

Compound Poisson DistributionCompound Poisson Distribution

Y = C1 + C2 + . . . + CN

N is Poisson random variable Ci are i.i.d. with Gamma distribution

This is an example of a Tweedie distribution

Y is member of Exponential Family

Y = C1 + C2 + . . . + CN

N is Poisson random variable Ci are i.i.d. with Gamma distribution

This is an example of a Tweedie distribution

Y is member of Exponential Family

Page 14: GLM I: Introduction to Generalized Linear Models

14

Members of the Exponential FamilyMembers of the Exponential Family

• Normal• Poisson• Binomial• Gamma• Inverse Gaussian• Compound Poisson

Page 15: GLM I: Introduction to Generalized Linear Models

15

Variance StructureVariance Structure

E[Yi] = μi = b' (θi) → θi = b' (-1)(μi )

Var[Yi] = a(Φi) b' ' (θi) = a(Φi) V(μi)

Common form: Var[Yi] = Φ V(μi)/wi

Φ is constant across data but weights applied to data points

E[Yi] = μi = b' (θi) → θi = b' (-1)(μi )

Var[Yi] = a(Φi) b' ' (θi) = a(Φi) V(μi)

Common form: Var[Yi] = Φ V(μi)/wi

Φ is constant across data but weights applied to data points

Page 16: GLM I: Introduction to Generalized Linear Models

16

V(μ) Normal 0

Poisson Binomial (1-) Tweedie p, 1<p<2 Gamma 2

Inverse Gaussian 3

Recall: Var[Yi] = Φ V(μi)/wi

Variance Functions V(μ)

Page 17: GLM I: Introduction to Generalized Linear Models

17

Variance at Point and Fit Variance at Point and Fit

A Practioner’s Guide to Generalized Linear Models: A CAS Study Note

Page 18: GLM I: Introduction to Generalized Linear Models

18

Why Exponential Family?Why Exponential Family?

Distributions in Exponential Family can model a variety of problems

Standard algorithm for finding coefficients a0, a1, …, am

Distributions in Exponential Family can model a variety of problems

Standard algorithm for finding coefficients a0, a1, …, am

Page 19: GLM I: Introduction to Generalized Linear Models

19

Modeling Number of ClaimsModeling Number of Claims

Number ofPolicy Sex Territory Claims in 5 Years

1 M 02 02 F 01 03 F 01 04 F 02 15 F 01 06 F 02 17 M 02 28 M 02 29 M 02 1

10 F 01 1: : : :

Page 20: GLM I: Introduction to Generalized Linear Models

20

Assume a Multiplicative ModelAssume a Multiplicative Model

μi = expected number of claims in five years

μi = BF,01 x CSex(i) x CTerr(i)

If i is Female and Terr 01

→ μi = BF,01 x 1.00 x 1.00

μi = expected number of claims in five years

μi = BF,01 x CSex(i) x CTerr(i)

If i is Female and Terr 01

→ μi = BF,01 x 1.00 x 1.00

Page 21: GLM I: Introduction to Generalized Linear Models

21

Multiplicative ModelMultiplicative Model

μi = exp(a0 + aSXS(i) + aTXT(i))

μi = exp(a0) x exp(aSXS(i)) x exp( aTXT(i))

i is Female → XS(i) = 0; Male → XS(i) = 1

i is Terr 01 → XT(i) = 0; Terr 02 → XT(i) = 1

μi = exp(a0 + aSXS(i) + aTXT(i))

μi = exp(a0) x exp(aSXS(i)) x exp( aTXT(i))

i is Female → XS(i) = 0; Male → XS(i) = 1

i is Terr 01 → XT(i) = 0; Terr 02 → XT(i) = 1

Page 22: GLM I: Introduction to Generalized Linear Models

22

Natural Log Link FunctionNatural Log Link Function

ln(μi) = a0 + aSXS(i) + aTXT(i)

μi is in (0 , ∞ )

ln(μi) is in ( - ∞ , ∞ )

ln(μi) = a0 + aSXS(i) + aTXT(i)

μi is in (0 , ∞ )

ln(μi) is in ( - ∞ , ∞ )

Page 23: GLM I: Introduction to Generalized Linear Models

23

Poisson Distribution in Exponential FamilyPoisson Distribution in Exponential Family

eb

yy

yY

)(

ln

)!ln(1

lnexp]Pr[

eb

yy

yY

)(

ln

)!ln(1

lnexp]Pr[

Page 24: GLM I: Introduction to Generalized Linear Models

24

Natural Log is Canonical Link for PoissonNatural Log is Canonical Link for Poisson

θi = ln(μi)

θi = a0 + aSXS(i) + aTXT(i)

θi = ln(μi)

θi = a0 + aSXS(i) + aTXT(i)

Page 25: GLM I: Introduction to Generalized Linear Models

25

Estimating Coefficients a1, a2, .., amEstimating Coefficients a1, a2, .., am

Classical linear regression uses least squares

GLMs use Maximum Likelihood Method

Classical linear regression uses least squares

GLMs use Maximum Likelihood Method

Page 26: GLM I: Introduction to Generalized Linear Models

26

Likelihood and Log LikelihoodLikelihood and Log Likelihood

)];(ln[,..),..;(

,..)],..;(ln[,..),..;(

);(,..),..;(

111

1111

111

ii

n

i

n

iii

yfy

yLy

yfyL

Page 27: GLM I: Introduction to Generalized Linear Models

27

Find a0, aS, and aT for PoissonFind a0, aS, and aT for Poisson

n

1i

a)()(0

111

)()(0)(a

:Maximize

)!ln(,..),..;(

iTTiSS

i

xaxaiiTTiSS

i

n

iii

eyxaxa

yeyy

Page 28: GLM I: Introduction to Generalized Linear Models

28

Iterative Numerical Procedure to Find ai’sIterative Numerical Procedure to Find ai’s

Use statistical package or actuarial software

Specify link function and distribution type

Use statistical package or actuarial software

Specify link function and distribution type

Page 29: GLM I: Introduction to Generalized Linear Models

29

Solution to Our ExampleSolution to Our Example

a0 = -.288 → exp(-.288) = .75 aS = .262 → exp(.262) = 1.3 aT = .095 → exp(.095) = 1.1

μi = exp(a0) x exp(aSXS(i)) x exp( aTXT(i))

μi = .75 x 1.3XS(i) x 1.1XT(i)

i is Male,Terr 01 → μi = .75 x 1.31 x 1.10

a0 = -.288 → exp(-.288) = .75 aS = .262 → exp(.262) = 1.3 aT = .095 → exp(.095) = 1.1

μi = exp(a0) x exp(aSXS(i)) x exp( aTXT(i))

μi = .75 x 1.3XS(i) x 1.1XT(i)

i is Male,Terr 01 → μi = .75 x 1.31 x 1.10

Page 30: GLM I: Introduction to Generalized Linear Models

30

Other Common Link FunctionsOther Common Link Functions

Identity link: canonical link for Normal

Logit link ln[ μ / (1- μ) ]: canonical link for Binomial maps (0,1) onto ( -∞, ∞)

Reciprocal 1/μ: canonical link for Gamma

But, you do not have to match link and distribution. Choose link and distribution that make sense!

Identity link: canonical link for Normal

Logit link ln[ μ / (1- μ) ]: canonical link for Binomial maps (0,1) onto ( -∞, ∞)

Reciprocal 1/μ: canonical link for Gamma

But, you do not have to match link and distribution. Choose link and distribution that make sense!

Page 31: GLM I: Introduction to Generalized Linear Models

31

Measuring Goodness of FitMeasuring Goodness of Fit

Deviance is measure of goodness of fitCompare different modelsCompare effect of dropping/adding Xij’s

Also called log likelihood ratio statistic

Deviance is measure of goodness of fitCompare different modelsCompare effect of dropping/adding Xij’s

Also called log likelihood ratio statistic

Page 32: GLM I: Introduction to Generalized Linear Models

32

More on DevianceMore on Deviance

)]ˆ,..,ˆ;,..,(),..,;,..,([2

model your from parameters:,..,

data from estimated be can as parameters

many as withmodel saturated:,..,

)],..,;,..,(),..,;,..,([2

m1111

m1

max1

m11max11

nnn

nn

yyyyyyD

aa

aa

aayyaayyD

Page 33: GLM I: Introduction to Generalized Linear Models

33

Generalized Linear Models - GLMsGeneralized Linear Models - GLMs

Response Y can be discrete or continuous

Y can be number of claims, probability of renewing, loss severity, loss ratio, etc.

Large and small policies can be put into one model.

Y can be nonlinear function of X’s

Measure goodness of fit using Deviance

Response Y can be discrete or continuous

Y can be number of claims, probability of renewing, loss severity, loss ratio, etc.

Large and small policies can be put into one model.

Y can be nonlinear function of X’s

Measure goodness of fit using Deviance

Page 34: GLM I: Introduction to Generalized Linear Models

34