Survival Regression Models - University of California,...

Survival Regression Models

David M. Rocke

May 6, 2021

David M. Rocke Survival Regression Models May 6, 2021 1 / 33

Background on the Proportional HazardsModel

The exponential distribution has constant hazard

f (t) = λe−λt

S(t) = e−λt

h(t) = λ

Let’s make two generalizations. First, let’s let the hazarddepend on covariates x1, x2, . . . xp. Second, let’s let thebase hazard depend on t. We can do this using eitherparametric or semi-parametric approaches.


The Cox Model

The generalization is that the hazard function is

η = β1x1 + · · ·+ βpxph(t|covariates) = h0(t)eη

This has a log link as in a generalized linear model. It issemi-parametric because the linear predictor depends onestimated parameters but the base hazard function isunspecified. There is no constant term because it isabsorbed in the base hazard. Note that for two differentindividuals with possibly different covariates, the ratio ofthe hazard functions is exp(η1)/ exp(η2) = exp(η1 − η2)which does not depend on t.


The Cox Model

How do we fit this model? We need to estimate thecoefficients of the covariates, and we need to estimatethe base hazard h0(t). For the covariates, supposing forsimplicity that there are no tied event times, let theevent times for the whole data set be t1, t2, . . . , tD . Letthe risk set at time ti be R(ti) and

ηj = β1xj1 + · · ·+ βpxjpθj = eηj

h(t|covariates) = h0(t)eη = θh0(t)


The Cox Model

Conditional on a single failure at time ti , the probabilitythat the event is due to subject j ∈ R(ti) is

Pr(j fails|1 failure at ti) =h0(ti)e

ηj∑k∈R(ti ) h0(ti)eηk

=θj∑

k∈R(ti ) θk


The Cox Model

The logic behind this has several steps. We first fix(ex post) the failure times and note that in thisdiscrete context, the probability pj that a subject jin the risk set fails at time t is just the hazard ofthat subject at that time.

If all of the pj are small, the chance that exactly onesubject fails is

∑k∈R(t)

pk

∏m∈R(t),m 6=k

(1− pm)

≈ ∑k∈R(t)

pk


The Cox Model

If subject j(i) is the one who fails at time ti , then thepartial likelihood is

L(β|T ) =∏i

θj(i)∑k∈R(ti ) θk

and we can numerically maximize this with respect tothe coefficients. When there are tied event timesadjustments need to be made, but the likelihood is stillsimilar. Note that we don’t need to know the basehazard to solve for the coefficients.


The Cox Model

Once we have coefficient estimates β1, . . . , βp, this also

defines ηj and θj and then the estimated base cumulativehazard function is

H(t) =∑ti<t

di∑k∈R(ti ) θk

which reduces to the Nelson-Aalen estimate when thereare no covariates. There are numerous other estimatesthat have been proposed as well.


Adjustment for Ties

At each time ti at which more than one of the subjectshas an event, let di be the number of events at thattime, Di the set of subjects with events at that time, andlet si be a covariate vector for an artificial subjectobtained by adding up the covariate values for thesubjects with an event at time ti . Let

ηi = β1si1 + · · ·+ βpsip

and θi = exp(ηi).


Adjustment for Ties

Let si be a covariate vector for an artificial subjectobtained by adding up the covariate values for thesubjects with an event at time ti . Note that

ηi =∑j∈Di

β1xj1 + · · ·+ βpxjp

= β1si1 + · · ·+ βpsipθi = exp(ηi)

=∏j∈Di

θi


Adjustment for Ties

Breslow’s method estimates the partial likelihood as

L(β|T ) =∏i

θi[∑

k∈R(ti ) θk ]di

=∏i

∏j∈Di

θj∑k∈R(ti ) θk

This method is equivalent to treating each event asdistinct and using the non-ties formula. It works bestwhen the number of ties is small. It is the default inmany statistical packages, including PROC PHREG inSAS.


Adjustment for Ties

The other common method is Efron’s, which is thedefault in R.

L(β|T ) =∏i

θi∏dij=1[∑

k∈R(ti ) θk −j−1di

∑k∈Di

θk ]

This is closer to the exact discrete partial likelihood whenthere are many ties. The third option in R (and an optionalso in SAS as “discrete”) is the “exact” method, whichis the same one used for matched logistic regression.


Suppose as an example we have a time t where there are20 individuals at risk and three failures. Let the threeindividuals have risk parameters θ1, θ2, θ3 and let the sumof the risk parameters of the remaining 17 individuals beθR . Then the factor in the partial likelihood at time tusing Breslow’s method is(

θ1θR + θ1 + θ2 + θ3

)(θ2

θR + θ1 + θ2 + θ3

)(θ3

θR + θ1 + θ2 + θ3

)


If on the other hand, they had died in the order 1,2, 3,then the contribution to the partial likelihood would be(

θ1θR + θ1 + θ2 + θ3

)(θ2

θR + θ2 + θ3

)(θ3

θR + θ3

)as the risk set got smaller with each failure. The exactmethod roughly averages the results for the six possibleorderings of the failures.But we don’t know the order they failed in, so instead ofreducing the denominator by one risk coefficient eachtime, we reduce it by the same fraction. This is Efron’smethod.


Efron’s method:

(θ1

θR + θ1 + θ2 + θ3

)(θ2

θR + 2(θ1 + θ2 + θ3)/3

)(θ3

θR + (θ1 + θ2 + θ3)/3

)


The coxph() Function in R

coxph {survival} R Documentation

Fit Proportional Hazards Regression Model

Description

Fits a Cox proportional hazards regression model.

Time dependent variables, time dependent strata, multiple events per subject,

and other extensions are incorporated using the counting process formulation of

Andersen and Gill.

Usage

coxph(formula, data=, weights, subset,

na.action, init, control,

ties=c("efron","breslow","exact"),

singular.ok=TRUE, robust=FALSE,

model=FALSE, x=FALSE, y=TRUE, tt, method, ...)


The coxph() Function in Rcoxph(formula, data=, weights, subset,





Arguments

formula a formula object, with the response on the left of a ~ operator,

and the terms on the right. The response must be a survival object

as returned by the Surv function.

data a data.frame in which to interpret the variables named in the formula,

or in the subset and the weights argument.

weights vector of case weights. If weights is a vector of integers,

then the estimated coefficients are equivalent to estimating the model

from data with the individual cases replicated as many times as

indicated by weights.

subset expression indicating which subset of the rows of data

should be used in the fit. All observations are included by default.


The coxph() Function in R

coxph(formula, data=, weights, subset,





Arguments

ties a character string specifying the method for tie handling.

If there are no tied death times all the methods are equivalent.

Nearly all Cox regression programs use the Breslow method by default,

but not this one. The Efron approximation is used as the default here,

it is more accurate when dealing with tied death times, and is as

efficient computationally. The \exact partial likelihood" is

equivalent to a conditional logistic model, and is appropriate

when the times are a small set of discrete values. See further below.


The proportional hazards model is usually expressed in terms of a

single survival time value for each person, with possible censoring.

Andersen and Gill reformulated the same problem as a counting

process; as time marches onward we observe the events for a

subject, rather like watching a Geiger counter. The data for a

subject is presented as multiple rows or “observations,” each of

which applies to an interval of observation (start, stop].


There are special terms that may be used in the model equation. A

strata term identifies a stratified Cox model; separate baseline

hazard functions are fit for each stratum. The cluster term is used

to compute a robust variance for the model. The term + cluster(id)

where each value of id is unique is equivalent to specifying the

robust=T argument. If the id variable is not unique, it is assumed

that it identifies clusters of correlated observations. The robust

estimate arises from many different arguments and thus has had

many labels. It is variously known as the Huber sandwich estimator,

White’s estimate (linear models/econometrics), the

Horvitz-Thompson estimate (survey sampling), the working

independence variance (generalized estimating equations), the

infinitesimal jackknife, and the Wei, Lin, Weissfeld (WLW) estimate.


Cox Model for bmt Survival vs. Group

> dfsurv <- Surv(bmt$t2,bmt$d3)

> bmt.cox <- coxph(dfsurv~factor(group),data=bmt)

> summary(bmt.cox)

n= 137, number of events= 83

coef exp(coef) se(coef) z Pr(>|z|)

factor(group)2 -0.5742 0.5632 0.2873 -1.999 0.0457 *

factor(group)3 0.3834 1.4673 0.2674 1.434 0.1516

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

exp(coef) exp(-coef) lower .95 upper .95

factor(group)2 0.5632 1.7757 0.3207 0.989

factor(group)3 1.4673 0.6815 0.8688 2.478

Concordance= 0.625 (se = 0.031 )

Rsquare= 0.094 (max possible= 0.996 )

Likelihood ratio test= 13.45 on 2 df, p=0.001199

Wald test = 13.03 on 2 df, p=0.00148

Score (logrank) test = 13.81 on 2 df, p=0.001004



Hypothesis tests for factor levels comparing groups 2 and 3 to group 1.

Group 3 has the highest hazard so the most significant comparison

is not directly shown.

The coefficient 0.3834 is on the log-hazard-ratio scale, as in log-risk-ratio.

The next column gives the hazard ratio 1.4673, and a hypothesis (Wald) test.

The (not shown) group 3 vs. group 2 log hazard ratio is 0.3834 + 0.5742 = 0.9576.

The hazard ratio is then exp(0.9576) or 2.605. Inference on all coefficients

and combinations can be constructed using coef(bmt.cox) and vcov(bmt.cox)

as with logistic regression.

coef exp(coef) se(coef) z Pr(>|z|)

factor(group)2 -0.5742 0.5632 0.2873 -1.999 0.0457 *

factor(group)3 0.3834 1.4673 0.2674 1.434 0.1516

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



The next part of the output gives 95% confidence intervals for the

relative risk. For the difference between groups 2 and 3 we need to use

coef(bmt.cox) and vcov(bmt.cox).

exp(coef) exp(-coef) lower .95 upper .95

factor(group)2 0.5632 1.7757 0.3207 0.989

factor(group)3 1.4673 0.6815 0.8688 2.478

> c1 <- coef(bmt.cox)

> v1 <- vcov(bmt.cox)

> cont1 <- c(-1,1)

> t(cont1) %*% c1

[,1]

[1,] 0.9576104 log hazard ratio group 3 to group2

> t(cont1) %*% v1 %*% cont1

[,1]

[1,] 0.07040675

> sqrt(t(cont1) %*% v1 %*% cont1)

[,1]

[1,] 0.2653427 standard error of log hazard ratio



Concordance is agreement of first failure between pairs of subjects and

higher predicted risk between those subjects, omitting non-informative pairs.

The Rsquare value is Cox and Snell’s pseudo R-squared and is not very useful.

There are three tests for whether the model with the group covariate

is better than the one without

--Usual likelihood ratio chi-squared

--Wald test chi-squared, obtained by adding the squares of the z-scores

--score = log-rank test, as with comparison of survival functions.

The likelihood ratio test is probably best in smaller samples, followed

by the Wald test.

Concordance= 0.625 (se = 0.031 )

Rsquare= 0.094 (max possible= 0.996 )

Likelihood ratio test= 13.45 on 2 df, p=0.001199

Wald test = 13.03 on 2 df, p=0.00148

Score (logrank) test = 13.81 on 2 df, p=0.001004



The anova function performs the likelihood ratio test for comparing models.

One can use drop1(), add1(), step(), or compare two explicit models.

> anova(bmt.cox)

Analysis of Deviance Table

Cox model: response is dfsurv

Terms added sequentially (first to last)

loglik Chisq Df Pr(>|Chi|)

NULL -373.30

factor(group) -366.57 13.452 2 0.001199 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Survival Curves from the Cox Model

We defined the “base hazard” but then did not use it toestimate the coefficients. In fact, there is no meaningfulbase hazard. In this case, it is the hazard for group 1.We can use survfit to get survival functions, but bydefault it produces the hazard for an average individualwith average covariates. This is like the family with 1.5children—it does not exist. Always it is best to specifythe covariate level(s) for which you want the survivalcurve(s). In this case, we can plot the Cox modelsurvival curves, which are by definition proportional,along with the individual KM curves for the groups.


plot(survfit(dfsurv~group,data=bmt))

lines(survfit(bmt.cox,data.frame(group=1:3),conf.int=F),col="red")

legend("bottomleft",c("Kaplan-Meier","Cox Model"),col=c("black","red"),lwd=1)

title("Survival Functions for Three Groups by KM and Cox Model")

When we use survfit() with a Cox model, we should include a data frame

with the same named columns as the predictors in the Cox model and

one or more levels of each.

For example

> data.frame(group=1:3,age=50)

group age

1 1 50

2 2 50

3 3 50

> data.frame(group=rep(1:3,2),age=rep(c(50,60),each=3))

group age

1 1 50

2 2 50

3 3 50

4 1 60

5 2 60

6 3 60


0 500 1000 1500 2000 2500

0.0

0.2

0.4

0.6

0.8

1.0

Kaplan−MeierCox Model

Survival Functions for Three Groups by KM and Cox Model


Homework

The addicts data set from the text (see page 91) is froma study by Caplehorn et al. (“Methadone Dosage andRetention of Patients in Maintenance Treatment,” Med.J. Aust., 1991). These data comprise the times in daysspent by heroin addicts from entry to departure from oneof two methadone clinics. There are two furthercovariates, namely, prison record and methadone dose,believed to affect the survival times.


Homework

The data set and R input code are on the website. Thevariables are as follows:

id: Subject ID

clinic: Clinic (1 or 2)

status: Survival status (0 = censored, 1 = departedfrom clinic)

time: Survival time in days

prison: Prison record (0 = none, 1 = any)

methadone: Methadone dose (mg/day)


Homework

1 Plot the Kaplan-Meier survival curves for the twoclinics.

2 Test for whether the two survival curves could havecome from the same process using survdiff.

3 Plot the cumulative hazards from the Nelson-Aalenestimator.


Homework

4 A common comparison plot for proportional hazardsis the complimentary log-log survival plot whichplots ln(− ln[S(x)]) = ln H(t) against ln(t). If thehazards are proportional, then so are the cumulativehazards, and after taking logs, the curves should beparallel. If the lines are straight, then the Weibullmodel may be appropriate. Make this plot for thetwo clinics using the Nelson-Aalen estimator andcomment on the results. You will use the “cloglog”option in the plot command.


Homework

5 Construct a Cox model using only the clinicvariable. Is the “survival” different at the twoclinics? What is the estimated hazard (risk) ratio, atest for significance, and a 95% confidence interval?

6 Pick one test for the null hypothesis that the clinicsdo not differ. Why would you depend on this testmore than the others?

7 Consider adding the prison and methodone variables.Which of these covariates will improve the model?

8 Plot the two survival curves from your chosen Coxmodel and add the two KM survival curves. Whatdo you think?


Survival Regression Models - University of California,...

Documents

Transcript of Survival Regression Models - University of California,...