Parametric regression models - Universitetet i oslo · sex 0.054402522 0.0305622598 1.780056...

Parametric regression models

STK4080 H16

1. Likelihood for censored data

2. Parametric regression models

3. Poisson regression models

4. Accelerated failure time models

5. Martingale formulation of parametric survival models

Parametric regression models – p. 1/40

Parametric models without covariates

Let Ti have hazardα(t|θ), cumulative hazardA(t|θ), density

f(t|θ) and survival functionS(t|θ).

We observe right censored data(Ti, Di) whereTi = min(Ci, Ti)

andDi = I(Ti = Ti) andCi a censoring time independent ofXi.

Such right censored data have likelihoodL(θ) ∝∏n

i=1 Li(θ)

where the likelihood contributions equal

Li(θ) = f(Ti|θ)DiS(Ti|θ)1−Di = α(Ti|θ)Di exp(−A(Ti|θ)).

Under standard assumptions the MLEθ that maximizesL(θ) is

approximately

θ ∼ N(θ, I(θ)−1)

whereI(θ) = −∂2 log(L(θ))∂θ2

is the information matrix.Parametric regression models – p. 2/40

Example: Exponential lifetimes

If the true survival timesTi ∼ exp(ν) and we have right censored

data(Ti, Di) with τ as the maximal observation time,we get a

likelihood,

L =

n∏

i=1

νDi exp(−Tiν) = νN•(τ) exp(−νR(τ))

whereN•(τ) =∑n

i=1Di the no. observed events (occurrence)

andR(τ) =∑n

i=1 Ti the total exposure time.

It then follows that the MLE ofν equals

ν =N•(τ)

R(τ)=

OccurrenceExposure

and the standard error ofν is 1/√I(ν) = ν/

√N•(τ).


Piecewise constant hazardα(t) = θkIk(t)

whereIk(t) = 1 on intervalstk−1 < t ≤ tk and zero otherw. and

0 = t0 < t1 < . . . < tK = τ .

It can then be shown that withOk is the number of events andRk

the total observational time (exposure) in intervalk the MLE for

θk is given

θk =Ok

Rk

.

Furthermore the information matrix becomes diagonal and the

large sample distribution of theθk’s is independent normal

distributions with standard errorsθk/√Ok

Also, the standard error forlog(θ) becomes1/√Ok and

θ exp(±1.96/√Ok) is generally a preferable confidence interval.


Parametric regression models, in general

Will assume that the distribution ofTi depends on a linear

predictorβ′xi and in addition on parameterθ.

The distribution may always be specified via the hazard, thus

Ti ∼ α(t|β′xi, θ)

i.e. the hazard for individuali. With cumulative hazard

A(t|β′xi, θ) we may express the likelihood contribution from

individual i by

Li(β, θ) ∝ α(Ti|β′xi, θ)Di exp(−A(Ti|β′xi, θ))

where we assume thatTi andCi are independent givenxi.


Proportional hazards model

1. Semi-parametric method is the most common

(Cox-regression)

2. Poisson-regression is a numerically simpler variation

3. Via accelerated failure time models if the baseline is

Weibull! (or exponential)

4. In general by likelihood-optimization.

Have discussed 1. thoroughly, will consider 2. and 3.


Additive hazard modelsα(t|β′x, θ) = β′x+ α0(t|θ)

1. Semi-parametric, Lin & Ying-model,

2. Special case of Aalen’s additive model,

α(t|β′x, θ) = β(t)′x+ β0(t|θ)3. Parametric models are possible, but not programmed?

4. Poisson-regression is possible (Breslow, 1987) with

piecewise constant baseline

Have discussed 2. and mentioned 1. Will here briefly consider4.


Poisson-regression

AssumeYi ∼ Po(ni exp(ψ + β′xi). This generates likelihood

L =∏n

i=1

[nYii exp(Yi(ψ+β

′xi))

Yi!exp(−ni exp(ψ + β′xi))

]

∝∏n

i=1 [exp(Yi(ψ + β′xi)) exp(−ni exp(ψ + β′xi))]

which may be maximized with a program for Poisson-regression,

in Ras

• Generalized linear model:glm

• with Poisson-familyfamily=poisson

• Need "offset " for log(ni) (or alternatively with

weighting)

• May also fit other link functions than default log-link

f.ex. additive model E[Yi] = ni(ψ + β′xi)Parametric regression models – p. 9/40

Exponentially distributed lifetimes

Ti ∼ α(t|xi) = exp(ψ + β′xi)

and cumulative hazardA(t|xi) = t exp(ψ + β′xi).

With observationsTi = right censored lifetimes andDi =

indicator for events gives likelihood

L =∏n

i=1 α(Ti|xi)Di exp(−A(Ti|xi))=

∏ni=1 exp(Di(ψ + β′xi)) exp(−Ti exp(ψ + β′xi))

which is proportional to a Poisson-likelihood under assumption

Di ∼ Po(Ti exp(ψ + β′xi)).

May thus fit this model by Poisson-regression! !!


Example: Melanoma data

> summary(glm(dead˜offset(log(lifetime))+ulcer+logth ick+age+sex,

family=poisson))

Deviance Residuals:

Min 1Q Median 3Q Max

-1.757198 -0.7833086 -0.4186261 0.5735247 2.35767

Coefficients:

Value Std. Error z value

(Intercept) -3.27166724 0.793644496 -4.122333

ulcer -0.96045856 0.324053150 -2.963892

logthick 0.51104055 0.177549752 2.878295

age 0.01301791 0.007918443 1.643999

sex 0.34101496 0.270580076 1.260311

(Dispersion Parameter for Poisson family taken to be 1 )

Null Deviance: 232.0768 on 204 degrees of freedom

Residual Deviance: 188.029 on 200 degrees of freedom

Number of Fisher Scoring Iterations: 5


Alternative to offset: Weighting

AssumeYi ∼ Po(niµi) so that E[Yi/ni] = µi and

Var

[Yini

]=niµin2i

=µini

May alternatively fit the model by

• ResponsesYini

• Weightswi = ni (in R: glm )

• Link function log(µi) = ψ + β′xi

glm -routine inRdo not require integer responses!

It will also work for "quasi-survival-responses"Di/Ti withweightsTi.


Example: Melanoma data, weighting (compare pg. 19)

> summary(glm(I(dead/lifetime)˜+ulcer+logthick+age+s ex,

family=poisson,weight=lifetime))

Deviance Residuals:


-1.757189 -0.7833315 -0.4186496 0.5735243 2.357629

Coefficients:

Value Std. Error t value

(Intercept) -3.27167476 0.792273252 -4.129478

ulcer -0.96041516 0.323111908 -2.972392

logthick 0.51101978 0.177128703 2.885020

age 0.01301783 0.007912787 1.645164

sex 0.34101511 0.270233537 1.261927






Additive exponential regression model:

Ti ∼ α(t|xi) = ψ + β′xi

With censored data(Ti, Di) we get likelihood corresponding to

• Di ∼ Po(Tiµi)

• whereµi = ψ + β′xi

• and such that E[Di/Ti] = µi and Var[Di

Ti

]= Tiµi

T 2

i

= µiTi

It is thus possible to fit this survival model byglm , weighting

andidentity -link.

However: With identity linkRneeds modification of responses

Di/Ti = 0, need to make new responseD′

i = Di + εi for small

εi, f.ex. 0.0001.

This may still be unstable, in the example on next page I neededto usegrthick instead oflogthick and omitage .


Example: Melanoma data, additive exponential model

> deadx<-dead+0.00001

> summary(glm(I(deadx/lifetime)˜ulcer+factor(grthick )+sex,

family=poisson(link=identity),weight=lifetime))

Deviance Residuals:


-1.605446 -0.8411342 -0.3830547 0.6907776 2.426813

Coefficients:


(Intercept) 0.07537012 0.03424506 2.200905

ulcer -0.04302607 0.01545833 -2.783358

factor(grthick)2 0.03898369 0.01583326 2.462139

factor(grthick)3 0.03337742 0.02378281 1.403426

sex 0.01950260 0.01178936 1.654254






While we are at it: Exponential model, square root link!

> deadx<-dead+0.00001

> summary(glm(I(deadx/lifetime)˜ulcer+logthick+age+s ex,

family=poisson(link=sqrt),weight=lifetime))

Deviance Residuals:


-1.678628 -0.7994057 -0.3830467 0.5879871 2.578291

Coefficients:


(Intercept) 0.174134346 0.0856882213 2.032185

ulcer -0.101541779 0.0336639635 -3.016335

logthick 0.053093298 0.0168588388 3.149286

age 0.001679106 0.0008935945 1.879047

sex 0.054402522 0.0305622598 1.780056






Piecewise constant hazards

We may also use Poisson-regression under assumption of

piecewise constant hazards

α(t|x) = exp(β′x+ θj) whentj−1 < t ≤ tj

where0 = t0 < t1 < · · · < tJ is a partition of positive real

numbers.

Let

• Tij =

tj − tj−1 whenTi > tj

Ti − tj−1 whentj−1 < Ti ≤ tj

0 whenTi ≤ tj−1

• Dij = DiI(tj−1 ≤ Ti ≤ tj)


Piecewise constant hazards, contd.

ThusTij = "exposure time" ind.i in interval(tj−1, tj ]

andDij = indicator for event ind.i in interval(tj−1, tj ].

Likelihood for data becomes, withαj = exp(θj) andA(t|x) =cumulative hazard with covariatex,

L =∏n

i=1 α(Ti|xi)Di exp(−A(Ti|xi))=

∏ni=1

∏Jj=1

[exp(θj + β′xi)

Dij exp(−Tij exp(θj + β′xi))]

This is proportional to a likelihood for

Dij ∼ Po(Tij exp(θj + β′xi)

and we may again use Poisson-regression to fit the model as longas we include afactor variable for interval.


Melanoma data, 2 intervals> dead2int<-c(dead * (lifetime<=3),dead[lifetime>3])

> lifetime2<-c(pmin(lifetime,3),lifetime[lifetime>3] -3)

> intervall<-c(rep(1,length(lifetime)),rep(2,sum(lif etime>3)))

> ulcer2<-c(ulcer,ulcer[lifetime>3])

> logthick2<-c(logthick,logthick[lifetime>3])

> sex2<-c(sex,sex[lifetime>3])

> age2<-c(age,age[lifetime>3])

> summary(glm(dead2int˜offset(log(lifetime2))+interv all+ulcer2

+logthick2+age2+sex2,family=poisson),cor=F)

Coefficients:


(Intercept) -3.22116888 0.918476470 -3.5070783

intervall -0.02941870 0.270372123 -0.1088082

ulcer2 -0.95843701 0.324402947 -2.9544646

logthick2 0.51036993 0.177491307 2.8754644

age2 0.01287056 0.008027254 1.6033572

sex2 0.34053830 0.270448130 1.2591631




Number of Fisher Scoring Iterations: 5 Parametric regression models – p. 19/40

More piecewise constant hazard

When we let the interval lengthstj − tj−1 become small we get a

very flexible - almost semi-parametric - model.

In fact Cox-regression is the limit when alltj − tj−1 → 0

(Breslow, 1972).

But since Poisson-regression allows for more link-functions wealso get alternativs to Cox-regression.


Aggregated data and piecewise constant hazard

Assume that covariate vectorx only may attain a finite number

of valuesz1, z2, . . . , zK (f.ex. only categorical covariates).

Then we may aggregate data to

• Total exposure time in interval(tj−1, tj ] with covariatezk:

T•j,zk =∑

i:xi=zkTij

• Total no. events in interval(tj−1, tj ] with covariatezk:

D•j,zk =∑

i:xi=zkDij

Vi may also express the likelihood as

L =∏

zk

J∏

j=1

[exp(αj + β′zk)

D•j,zk exp(−T•j,zk exp(αj + β′zk))]

thus as proportional with likelihood for Poisson-dataD•j,zk ∼ Po(T•j,zk exp(αj + β′zk))


Use for Poisson-regression

Previously it was not possible to use Cox-regression on large

data sets (n > 500.000, f.ex.)

With a lot of time-dependent covariates this continues to bea

problem.

However, first using programs for å aggregating data one may

instead use Poisson-regression.

The future use of Poisson-regression for survival data is the

flexibility these models offer wrt. link functions, random

component models (frailty), smoothing techniques and multiple

time scales.


Ex. Traditional use of Poisson-regression

Samuelsen, Magnus & Bakketeig, 1998, "Birth weight and

mortality in childhood in Norway", AJE:

• ca. 1.250.000 children born in Norway, 1967-1990.

• Follow-up from 1-15 year of age to or 1992.

• Covariates:

• Birth weight (≤ 2500, > 2500g)

• Lengt of pregnancy (< 37 or ≥ 37 weeks

• Sex

• Maternal age - grouped

• Previous births for (yes/no, paritet)

• Birth cohort: 1967-1975, 1976-1984, 1985-1990.

• Age (one year intervals) = time svariableParametric regression models – p. 23/40

Results for cancer mortality


(Intercept) -9.487010362 0.14513937 -65.36483242

vektkat < 5 -1.296055900 0.39594733 -3.27330379

factor(kjonn) -0.297092337 0.07739885 -3.83845908

factor(koho)2 -0.309147054 0.08575119 -3.60516350

factor(koho)3 -0.864394863 0.20015213 -4.31868928

factor(pari) -0.039848282 0.08261364 -0.48234509

factor(mald)2 0.013457555 0.15636110 0.08606716

factor(mald)3 0.274168093 0.12765561 2.14771672

factor(gest)2 0.038185223 0.21835460 0.17487712

factor(gest)3 -0.230881244 0.19316189 -1.19527326

factor(alder)2 0.157311796 0.17058730 0.92217766

factor(alder)3 0.234812145 0.16865818 1.39223694

factor(alder)4 0.165260296 0.17258180 0.95757660

factor(alder)5 0.188845360 0.17291289 1.09214161

factor(alder)6 -0.007195062 0.18305818 -0.03930478

factor(alder)7 -0.234062467 0.19694164 -1.18848644

factor(alder)8 -0.338458601 0.20586398 -1.64408849

factor(alder)9 -0.712831213 0.23678818 -3.01041726

factor(alder)10 -0.358587244 0.21339711 -1.68037532

factor(alder)11 -0.616976682 0.23701617 -2.60309957

factor(alder)12 -0.350074062 0.22032205 -1.58891980Parametric regression models – p. 24/40

Lexi-diagrams, time scales

Born from 1967, time until event / censoring according to exp.

distribution.

Every person is represented as a line through the diagram with

x−axis age andy−axis calendar time:

alder

kale

nder

tid

0 5 10 15 20 25

1970

1980

1990

2000

We may read off the figure exposure time in 5 year intervals and5-year calendar periods.


Age-period-cohort (APC) problem

For one point in the Lexis-diagram

• x = age

• y = calendar time

• z = year of birth

Thenz + x = y, thus perfect linear dependency.

It would be tempting to include all ofx, y og z as covariates,

however we are not able to use all of them in one model -

without further restrictions.

The APC problem is thus non-identifiable.


Accelerated failure time models

α(t|β′x, θ) = exp(β′x)α0(exp(β′x)t|θ)

1. Will only discuss fully parametric modelsα0(t|θ)2. Semi-parametric methods exist

Translation models: α(t|β′x, θ) = α0(t+ β′x|θ)1. possible inR

2. Semi-parametric methods exist


Characterizations Accelerated failure time models

Characterization 1. Uncensored survival timeT

Y = log(T ) = µ+ γ′x+ σW

where

• W is a (standardized) random variable with specified

distribution

• σ degree of variation inY compared toW

• µ is the center in the distribution ofY if Z or γ = 0.

This is thus a log-linear regression model forT , but contrary to

standard linear regression regression models we assume a

paricular distribution for the error termσW


Alternative charcterization

With S0() the survival function ofexp(σW + µ) (i.e. the survival

function withx = 0) we get

P(T > t) = P(exp(σW + µ+ γ′x) > t) = S0(t exp(−γ′x)).

Thus the

time scale forT givenx

equals the

timescale forT givenx = 0 multiplied with exp(−γ′x)

We may callexp(−γ′x) = exp(β′x) anacelleration factor(whereβ = −γ).


Characterization 3. of Acc. failure time models

With A0() cumulative hazard forexp(σW + µ) we may

alternatively write

P(T > t) = S0(t exp(β′x)) = exp(−A0(t exp(β

′x)))

which leads to the hazard forT givenx by

α(t|x) = exp(β′x)α0(t exp(β′x))

whereα0(t) is the hazard ofexp(σW + µ).

Note: Rwill report γ = −β.


For accelerated failure time models specify

the distribution of the error termσW . The following are

implemented inR

• σW extreme-value (Gumbel) distributed

andexp(σW ) Weibull-distributed

• Special case:exp(W ) exponential distributed,σ = 1.

• σW is logistic andexp(σW ) log-logistic

• σW is normal (gaussian) anexp(σW ) log-normal

The Weibull is the default for these parametric survival models inR.


Acc. failure time mod. fitted by command survreg

> survreg(Surv(lifetime,dead)˜ulcer+logthick+age+sex ,data=mel)

Call:

survreg(formula = Surv(lifetime, dead) ˜ ulcer + logthick + age + sex, data

mel)

Coefficients:

(Intercept) ulcer logthick age sex

3.119864 0.8198126 -0.4289668 -0.0129206 -0.2883985

Scale= 0.8231279

Loglik(model)= -207.8 Loglik(intercept only)= -230.9

Chisq= 46.23 on 4 degrees of freedom, p= 2.2e-09

n= 205

Note. Opposite sign compared to Cox-regression,γ = −β!


Mer info fra summary: Weibull=default

> summary(survreg(Surv(lifetime,dead)˜ulcer+logthick +age+sex,

dist="weibull"))

Value Std. Error z p

(Intercept) 3.1199 0.65819 4.74 2.14e-06

ulcer 0.8198 0.27659 2.96 3.04e-03

logthick -0.4290 0.15343 -2.80 5.18e-03

age -0.0129 0.00655 -1.97 4.84e-02

sex -0.2884 0.22478 -1.28 1.99e-01

Log(scale) -0.1946 0.11522 -1.69 9.12e-02

Scale= 0.823

Weibull distribution



Number of Newton-Raphson Iterations: 6

n= 205


Exponential

summary(survreg(Surv(lifetime,dead)˜ulcer+logthick+ age+sex,

dist="exponential",data=mel))

Value Std. Error z p

(Intercept) 3.272 0.79404 4.12 3.78e-05

ulcer 0.960 0.32419 2.96 3.05e-03

logthick -0.511 0.17762 -2.88 4.02e-03

age -0.013 0.00792 -1.64 1.00e-01

sex -0.341 0.27070 -1.26 2.08e-01

Scale fixed at 1

Exponential distribution



May here test if exponetial model is sufficient.

LRT = 2( [Log-lik. Weibull] - [Log-lik. Exponential])= 2(209.1-207.8) = 2.6 i.e. insignificant (comparing withχ2

1).


Accelerated failure times with Weibull distributed T

If exp(σW + µ) has a Weibull distribution with hazard

α0(t) = btk−1 then the hazard forT with covariatex becomes

α(t|x) = exp(β′x)α0(t exp(β′x)) = b exp(kβ′x)tk−1

and soT is also Weibull distributed. More important

α(t|x) = exp(kβ′x)α0(t)

i.e. we have a proportional hazard model forT with regression

parameterkβ.

There is in fact an equivalence here: If the distribution of theT ’scan be represented both by a proportional hazards and anaccelerated failure time model thenT is Weibull distributed (Cox& Oakes, 1984, Analysis of survival data).


About parametrization.

After fitting an accelerated failure time model we may translatethe results into log-hazard ratios byβ = −γ/σ whereγ are theestimated regression parameters in the acc. failure time modelandσ the scale estimate:

> survregfit<-survreg(Surv(lifetime,dead)˜ulcer+logt hick+age+sex)

> -survregfit$coef[2:5]/survregfit$scale

ulcer logthick age sex

-0.9959723 0.5211423 0.01569695 0.350369

> coxfit<-coxph(Surv(lifetime,dead)˜ulcer+logthick+a ge+sex)

> coxfit$coef

ulcer logthick age sex

-0.9436516 0.5549527 0.01145059 0.363405

Thus there is a good correspondence between thesemi-parametric and parametric estimates.


Likelihood and counting processes

We will now assume that the true survival times have hazard

αi(t; θ) possibly for some regression model.

We may have left-truncated and right-censored observations with

likelihood contributions

Li(θ) = exp(

∫log(αi(t; θ))dNi(t)−

∫Yi(t)αi(t; θ)dt)

(check that this corresponds to the formula for right-censored

data(Ti, Di) on the first page of these handouts.)

This gives a total log-likelihood

l(θ) =

n∑

i=1

log(Li(θ)) =

n∑

i=1

[

∫log(αi(t; θ))dNi(t)−

∫Yi(t)αi(t; θ)dt]


Score function and martingale

To keep notation simple we will consider only a scalarθ. The

score function then becomes

U(θ) =∂l(θ)

∂θ=

n∑

i=1

[

∫α′

i(t; θ)

αi(t; θ)dNi(t)−

∫Yi(t)α

′

i(t; θ)dt]

whereα′

i(t; θ) =∂αi(t;θ)∂θ

. Then since

dNi(t) = Yi(t)αi(t; θ)dt+ dMi(t)

whereMi(t) is a martingale we get

U(θ) =

n∑

i=1

∫α′

i(t; θ)

αi(t; θ)dMi(t),

i.e. a sum of integrals wrt martingales and E[U(θ)] = 0 (thisexpectation is no surprise, sinceU(θ) is a score-function).Parametric regression models – p. 38/40

Var(U(θ)) and information I(θ) = −∂U(θ)∂θ

By standard martingale arguments we now get that

Var(U(θ)) = E[n∑

i=1

∫ (α′

i(t; θ)

αi(t; θ)

)2

Yi(t)αi(t; θ)dt]

But we also have that the observed information can be written

I(θ) = −n∑

i=1

[

∫{α

′′

i (t; θ)

αi(t; θ)−(α′

i(t; θ)

αi(t; θ)

)2

}dNi(t)−∫Yi(t)α

′′

i (t; θ)dt]

Again insertingdNi(t) = Yi(t)αi(t; θ)dt+ dMi(t) we get

I(θ) =n∑

i=1

∫ (α′

i(t; θ)

αi(t; θ)

)2

Yi(t)αi(t; θ)dt+ M

whereM is a sum of integrals wrt martingales and have exp.zero. Thus we obtain Var(U(θ)) = E(I(θ)).


Estimated expected information

SincedNi(t) = Yi(t)αi(t; θ)dt+ dMi(t) where thedMi(t) are

martingale increments and since∂ log(αi(t;θ)∂θ

= α′

i(t; θ)/αi(t; θ)

we may estimate the expected information by

I(θ)) = Var(U(θ)) =n∑

i=1

∫(∂ log(αi(t; θ)

∂θ)2dNi(t)

where the MLEθ is inserted forθ.

The matrix version of this formula with is

I(θ)) = Var(U(θ)) =n∑

i=1

∫(∂ log(αi(t; θ)

∂θ)⊗2dNi(t)


Parametric regression models - Universitetet i oslo · sex 0.054402522 0.0305622598 1.780056...

Documents

Transcript of Parametric regression models - Universitetet i oslo · sex 0.054402522 0.0305622598 1.780056...