Chapter 14 Logistic Regression Models - IIT...

7
Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur 1 Chapter 14 Logistic Regression Models In the linear regression model X β ε + , there are two types of variables – explanatory variables 1 2 , ,.., k X X X and study variable y . These variables can be measured on a continuous scale as well as like an indicator variables. When the explanatory variables are qualitative, then their values are expressed as indicator variables and then dummy variable models are used. When the study variable is qualitative variable, then its values can be expressed using an indicator variable taking only two possible values 0 and 1. In such a case, the logistic regression is used. For example, y can denotes the values like success or failure, yes or no, like or dislike which can be denoted by two values 0 and 1. Consider the model 0 1 1 2 2 ' ... , 1, 2,..., i i i k ik i i i y x x x x i n β β β β ε β ε = + + + + + = + = where [ ] [ ] ' 1 2 0 1 2 1, , ,..., , ' , , ,..., i i i ik k x x x x β β β β β = = . The study variable takes two values as 0 i y = or 1. Assume that i y follows a Bernoulli distribution with parameter , i π so its probability distribution is 1 with ( 1) 0 with ( 0) 1 . i i i i i Py y Py π π = = = = = Assuming ( ) 0, i E ε = ( ) 1. 0.(1 ) . i i i i Ey π π π = + = From the model ' , i i i y x β ε = + we have ' ' ( ) ( ) ( ) ( 1). i i i i i i i Ey x Ey x Ey Py β β π = = = = = Thus response function ( ) i Ey is simply the probability that 1. i y =

Transcript of Chapter 14 Logistic Regression Models - IIT...

Page 1: Chapter 14 Logistic Regression Models - IIT Kanpurhome.iitk.ac.in/~shalab/regression/Chapter14-Regressi… ·  · 2016-06-27Regression Analysis | Chapter 14 | Logistic Regression

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur 1 1 1

Chapter 14

Logistic Regression Models

In the linear regression model X β ε+ , there are two types of variables – explanatory variables 1 2, ,.., kX X X

and study variable y . These variables can be measured on a continuous scale as well as like an indicator

variables. When the explanatory variables are qualitative, then their values are expressed as indicator

variables and then dummy variable models are used.

When the study variable is qualitative variable, then its values can be expressed using an indicator variable

taking only two possible values 0 and 1. In such a case, the logistic regression is used. For example, y can

denotes the values like success or failure, yes or no, like or dislike which can be denoted by two values 0

and 1.

Consider the model

0 1 1 2 2'

..., 1, 2,...,

i i i k ik i

i i

y x x xx i nβ β β β ε

β ε

= + + + + +

= + =

where [ ] [ ]'1 2 0 1 21, , ,..., , ' , , ,...,i i i ik kx x x x β β β β β= = .

The study variable takes two values as 0iy = or 1. Assume that iy follows a Bernoulli distribution with

parameter ,iπ so its probability distribution is

1 with ( 1)0 with ( 0) 1 .

i ii

i i

P yy

P yπ

π= =

= = = −

Assuming ( ) 0,iE ε =

( ) 1. 0.(1 ) .i i i iE y π π π= + − =

From the model ' ,i i iy x β ε= + we have

'

'

( )( )( ) ( 1).

i i

i i i

i i

E y xE y xE y P y

β

β π

=

⇒ = =⇒ = =

Thus response function ( )iE y is simply the probability that 1.iy =

Page 2: Chapter 14 Logistic Regression Models - IIT Kanpurhome.iitk.ac.in/~shalab/regression/Chapter14-Regressi… ·  · 2016-06-27Regression Analysis | Chapter 14 | Logistic Regression

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur 2 2 2

Note that ' ,i i iy xε β= − so

- when '1, then 1i i iy xε β= = −

- '0, then .i i iy xε β= = −

Recall that earlier iε was assumed to follow a normal distribution when y was not an indicator variable.

When y is an indicator variable, then iε takes only two values, so it cannot be assumed to follow a normal

distribution.

In usual regression model, the errors are homoskedastic, i.e., 2( )iVar ε σ= and so 2( )iVar y σ= . When y is

an indicator variable, then

[ ]

[ ]

[ ]

2

2 2

2

( ) ( )

(1 ) (0 ) (1 )(1 ) 1(1 )( ) 1 ( )

.i

i i i

i i i i

i i i i

i i

i i

y

Var y E y E y

E y E y

π π π π

π π π ππ π

σ

= −

= − + − −

= − − +

= −

= −

=

Thus ( )iVar y depends on iy and is a function mean of iy . Moreover, since ( )i iE y π= and iπ is the

probability, so 0 1iπ≤ ≤ and thus there is a constraint on ( )iE y that 0 ( ) 1.iE y≤ ≤ This puts a big

constraint on the choice of linear response function. One cannot fit a model in which the predicted values lie

outside the interval of 0 and 1.

When y is a dichotomous variable, then empirical evidences suggest that the function ( )E y on the whole

real line that can be mapped to [0,1] has the sigmoid shape. It is a nonlinear S − shape like

E(y)

x0

1

E(y)

x0

1

Page 3: Chapter 14 Logistic Regression Models - IIT Kanpurhome.iitk.ac.in/~shalab/regression/Chapter14-Regressi… ·  · 2016-06-27Regression Analysis | Chapter 14 | Logistic Regression

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur 3 3 3

A natural choice for ( )E y would be the cumulative distribution function of a random variable. In particular,

the logistic distribution, whose cumulative distribution function is the simplified logistic function yields a

good link and is given by

exp( )( )1 exp( )

exp( ' )1 exp( ' )

1 .1 exp( ' )

yE yy

xx

x

ββ

β

=+

=+

=+ −

Linear predictor and link functions:

The systematic component in E(y) is the linear predictor and is denoted as

' , 1, 2,..., , 0,1, 2,..., .i j ij ij

x x i n j kη β β= = = =∑

The link function in generalized linear model relates the linear predictor iη to the mean response iµ .

Thus

1

( )or ( ).

i i

i i

gg

µ η

µ η−

=

=

In the usual linear models based on the normally distributed study variable, the link ( )i ig µ µ= is used and

is called as identity link. A link function maps the range of iµ onto the whole real line, provides good

empirical approximation and carries meaningful interpretations in real applications.

In case of logistic regression, the link function is defined as

ln .1πηπ

=−

This transformation is called as the logit transformation of probability π and 1ππ−

is called as odds. The

link η is also called as log-odds. This link function is obtained as follows:

Page 4: Chapter 14 Logistic Regression Models - IIT Kanpurhome.iitk.ac.in/~shalab/regression/Chapter14-Regressi… ·  · 2016-06-27Regression Analysis | Chapter 14 | Logistic Regression

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur 4 4 4

11 exp( )

πη

=+ −

or [ ]1 exp( ) 1π η+ − =

or 1e η ππ

− −=

or ln1πµπ

=−

.

Note: Similar to logit function, there are other functions also which have same shape as of logistic

function. These functions can also be transformed through π . There are two such popular functions –

probit transformation and complementary log-log transformation. The probit transformation is based on the

transformation of π using the cumulative distribution function of normal distribution and based on this is the

probit regression model.

The complementary log-log transformation of π is [ ]ln ln(1 )π− − .

Maximum likelihood estimation of parameters:

Consider the general form of the logistic regression model

( )i i iy E y ε= +

where 'iy s are independent Bernoulli random variable with parameter iπ with

'

( )exp( )

1 exp( )

i i

iii

E yx

x

π

ββ

=

=+

The probability density function of iy is

1( ) (1 ) , 1, 2,..., , 0 or 1.i iy yi i i i if y i n yπ π −= − = =

The likelihood function is

1 2 1 21

1

1

( , ,..., , , ,..., ) ( )

( )(1 ) i

n

n k i ii

ny

i i ii

L y y y L f y

f y

β β β

π

=

=

= =

= −

Page 5: Chapter 14 Logistic Regression Models - IIT Kanpurhome.iitk.ac.in/~shalab/regression/Chapter14-Regressi… ·  · 2016-06-27Regression Analysis | Chapter 14 | Logistic Regression

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur 5 5 5

[ ]

1

1

1

1 1

ln ln ln(1 )

ln (1 ) ln(1 )

ln ln(1 ) .1

i i

ny y

i iin

i i i ii

n ni

i ii ii

L

y y

y

π π

π π

π ππ

=

=

= =

= + −

= + − −

= + − −

∑ ∑

Since

'

'

'

'

'

exp( ) ,1 exp( )

11 ,1 exp( )

exp( ),1

ln exp ,1

ii

i

ii

ii

i

ii

i

xx

x

x

x

βπβ

πβ

π βππ βπ

=+

− =+

=−

=−

so

' '

1 1ln ln 1 exp( )

n n

i i ii i

L y x xβ β= =

= − + ∑ ∑ .

Suppose repeated observations are available at each level of the x -variables. Let iy be the numbers of 1’s

observed for thi observation and in be the number of trials at each observation. Then

1 1 1

ln ln(1 ) ln(1 )n n n

i i i i i ii i i

L y n yπ π π= = =

= + − − −∑ ∑ ∑ .

The maximum likelihood estimate β of β is obtained by the numerical maximization.

If ( ) ,V ε = Ω then asymptotically

1 1

ˆ( )ˆ( ) ( ' ) .

E

V X X

β β

β − −

=

= Ω

After obtaining β , the linear predictor is estimated by

'ˆ .i ixη β=

The fitted value is

'

ˆexp( ) 1 1ˆ ˆ .ˆˆ ˆ1 exp( ) 1 exp( ) 1 exp( )i

i ii i i

yx

ηπη η β

= = = =+ + − + −

Page 6: Chapter 14 Logistic Regression Models - IIT Kanpurhome.iitk.ac.in/~shalab/regression/Chapter14-Regressi… ·  · 2016-06-27Regression Analysis | Chapter 14 | Logistic Regression

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur 6 6 6

Interpretation of parameters:

To understand the interpretation of the related ' sβ in the logistic regression model, first consider a simple

case with only one variable as

0 1( ) .x xη β β= +

After fitting of model, 0 1ˆ ˆandβ β are obtained as the estimators of 0 1andβ β respectively. Then the fitted

linear predictor at ix x= is

0 1ˆ ˆˆ( )i ix xη β β= +

which is the log-odds at .ix x= The fitted value at 1ix x= + is

0 1ˆ ˆˆ( 1) ( 1)i ix xη β β+ = + +

which is the log-odds at 1.ix x= +

Thus

[ ] [ ]1

1

ˆ ˆ ˆ( 1) ( )ln odds( 1) ln odds( )

odds( 1)lnodds( )

odds( 1) ˆexp( ).odds( )

i i

i i

i

i

i

i

x xx x

xx

xx

β η η

β

= + −

= + −

+=

+

⇒ =

This is termed as odd ratio which is the estimated increase in the probability of success when value of

explanatory variable changes by one unit.

When there are more than one explanatory variables in the model, then the interpretation of 'j sβ is similar

as in the case of single explanatory variable case. The odds ratio is exp ˆ( )jβ associated with explanatory

variable jx keeping other explanatory variables constant. This is similar to the interpretation of jβ in

multiple linear regression model.

If there is a m unit change is the explanatory variable, then the estimated increase in odds ratio is

exp ˆ( ).jmβ

Page 7: Chapter 14 Logistic Regression Models - IIT Kanpurhome.iitk.ac.in/~shalab/regression/Chapter14-Regressi… ·  · 2016-06-27Regression Analysis | Chapter 14 | Logistic Regression

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur 7 7 7

Test of hypothesis:

The test of hypothesis for the parameters in the logistic regression model is based on asymptotic theory. It is

a large sample test based on likelihood ratio test based on a statistic termed as deviance.

A model with exactly p parameters that perfectly fits to the sample data is termed as saturated model.

The statistic that compares the log-likelihoods of fitted and saturated models is called as model deviance.

It is defined as

ˆ( ) 2 ln (saturated model) 2 ln ( )L Lλ β β= −

where ln ( )L ⋅ is the log-likelihood and β is the maximum likelihood estimate of β .

In case of logistic regression model, 0iy = or 1 and iπ ’s are completely unrestricted. So the likelihood

will be maximum at i iyπ = and the maximum value of L (saturated modal) is

(saturated model) 1

ln (saturated model) 0.Maximum L

Maximum L=

⇒ =

Let β be the maximum likelihood estimator of β , then log-likelihood is maximum at ˆβ β= , and

' '

1 1

ˆ ˆln ( ) ln 1 exp( )

ln (saturated model).

n n

i i i ii i

L y x x

L

β β β= =

= − +

∑ ∑

Assuming that the logistic regression function is correct, the large sample distribution of likelihood ratio test

statistic ( )λ β is approximately distributed as 2 ( )n pχ − , when n is large.

Large value of ( )λ β implies model is incorrect. Small value of ( )λ β implies that model is well fitted and

is as good as the saturated model. Note that generally the fitted model will be having smaller number of

parameters than the saturated model that is based on all the parameters. Thus at %α level of significance.