2 Survival analysis -...

23
χ 2

Transcript of 2 Survival analysis -...

Page 1: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Survival analysis

Birthe Lykke ThomsenH. Lundbeck A/S

1

Types of response data• continuous data [simple plots; t-test,

ANOVA, regression . . . ]

• discrete data [histograms, plots; tables,χ2-test, logistic regression, proportionalodds, Poisson regression . . . ]

• censored data� event-time data (failure-time data,

survival data) [Kaplan-Meier curves;Cox regression]

� data with detection limit(s)

2

�Survival analysis�(=Event-time analysis)

• Characteristics of event-time data

• Example

• Randomized studies: The�Intention-to-treat� principle

• Non-parametric estimation (Kaplan-Meier,Nelson-Aalen)

• Comparison of to groups (log rank test)

• Regression model for event-time data (the�Cox-model�)

• Test in the Cox model

3

Page 2: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

• Explanatory variables in the Cox model� covariates� strati�cation� interaction

• Transformation of explanatory variables

• Model checking for the Cox model

• Choice of time scale/time zero

• Delayed entry

• Competing risks

• Comparison with dichotomous response

• Matched case-control designs

4

Characteristics of event-time data• response: time to the occurrence of a

certain event (death, recurrence, pregnancy,. . . ); makes speci�c models relevant

• censoring: for some of the individuals weonly know that the event has not yetoccurred (e.g., alive and healthy at the endof the study)

• left truncation/delayed entry: some ofthe individuals are not at risk in the studyfrom time zero (if the event had occurredbefore a speci�c time point, it would nothave been counted as an event in the study)

No assumptions about the form of thestatistical distribution of the event times, butcensoring must be independent of futurefailure given the covariates.

5

ExampleRandomized study of the e�ect of sclerotherapy

A study of 187 patients with bleedingoesophageal varices (due to liver cirrhosis).During the hospital admission for the �rstvariceal bleeding, the patient were randomizedinto one of two groups:

1. (standard) medical treatment (n=94)

2. medical treatment supplemented withsclerotherapy (n=93)

(EVASP, 1984)

We wish to investigate whether sclerotherapya�ects the risk for rebleeding.

6

Page 3: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Intention-to-treat(Randomized studies)

Some patients do not receive sclerotherapyalthough they were randomized to thesclerotherapy group � how should thesepatients be treated in the analyses?

All persons randomized to sclerotherapymust be included in the sclerotherapygroup in the analyses to avoid bias

Interpretation of the e�ect of the `treatment�:E�ect of the treatment regime

7

Rebleeding in the two groups

(Piecewise) constant rate gives (piecewise)linear cumulative rate (and a Poisson modelwould make better use of the available data)

8

If the event studied is all-cause mortality thenthe Kaplan-Meier curve estimates the survivalprobability as a function of time (�the survivalcurve�).

The mathematical relation between �survivalprobability� and the cumulative rate

Sg(T ) = exp(−Rg(T ))

Rg(T ) = − ln(Sg(T ))

The rate is number of events per time unit �but the cumulative rate has no immediateinterpretation, it is not equal to the probabilitythat the event has occurred at or before thegiven time point (but for small values it is agood approximation)

9

Page 4: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Calculations of survival curveand cumulative rate

10

Non-parametric estimation

On a given day t we observe the following ineach group g (denoted �stratum/strata�)1. ng(t) individuals in total2. mg(t) individuals starting to rebleed

which gives the daily rebleeding rates

rg(t) =mg(t)ng(t)

The Kaplan-Meier curve Sg(T ) for group g isobtained by multiplying the terms 1− rg(t) forall days t before and including day T .

The Nelson-Aalen estimate for the cumulativerebleeding rate Rg(T ) for group g is obtainedby adding the daily rebleeding rates for all dayst before and including day T .

11

Estimation of a Kaplan-Meier curve withpoint-wise 95 % con�dence intervalsResponse: �The time when the event occurs�.

But it does not occur for everyone, so 2variables are needed to describe the response:�time� and �what happened�.The data set SCL contains (among other things)DAY: time of exit from the studyBLD: 1 if rebleeding occurs, 0 if censoredSCLERO: 1 for the sclerotherapy group,

0 for the medically treated group

PROC PHREG DATA=scl NOPRINT;TITLE 'Kaplan-Meier curves';MODEL day*bld(0) = ;STRATA sclero;BASELINE OUT=km SURVIVAL=kmcurves

LOWER=lowerb UPPER=upperb/ CLTYPE=LOGLOG;

RUN;

12

Page 5: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

The data set KM generated by PHREGObs sclero day kmcurves lowerb upperb1 0 0 1.00000 . .2 0 0 0.98936 0.92762 0.998483 0 1 0.96809 0.90524 0.989494 0 2 0.94388 0.87125 0.97609: : : : : :37 0 245 0.30174 0.19512 0.4153938 0 444 0.27853 0.17385 0.3930339 0 589 0.24758 0.14428 0.3654540 1 0 1.00000 . .41 1 2 0.96512 0.89769 0.9883942 1 3 0.94129 0.86634 0.97480: : : : : :64 1 308 0.42846 0.30729 0.5440165 1 330 0.40466 0.28320 0.5226966 1 340 0.38086 0.25981 0.50087

13

Data modi�cations

Data modi�cations are necessary if we want thepoint-wise con�dence limits in the �gure:

DATA km; SET km;IF sclero=0 THEN DO;

type=1; curve=kmcurves; OUTPUT;type=2; curve=lowerb; OUTPUT;type=3; curve=upperb; OUTPUT;

END;IF sclero=1 THEN DO;

type=4; curve=kmcurves; OUTPUT;type=5; curve=lowerb; OUTPUT;type=6; curve=upperb; OUTPUT;

END;RUN;

14

The data set KM after the changessclero day kmcurves lowerb upperb type curve

0 0 1.00000 . . 1 1.0000 0 1.00000 . . 2 .0 0 1.00000 . . 3 .0 0 0.98936 0.928 0.998 1 0.9890 0 0.98936 0.928 0.998 2 0.9280 0 0.98936 0.928 0.998 3 0.9980 1 0.96809 0.905 0.989 1 0.9680 1 0.96809 0.905 0.989 2 0.9050 1 0.96809 0.905 0.989 3 0.9890 2 0.94388 0.871 0.976 1 0.9440 2 0.94388 0.871 0.976 2 0.8710 2 0.94388 0.871 0.976 3 0.976: : : : : : :1 330 0.40466 0.283 0.523 4 0.4051 330 0.40466 0.283 0.523 5 0.2831 330 0.40466 0.283 0.523 6 0.5231 340 0.38086 0.260 0.501 4 0.3811 340 0.38086 0.260 0.501 5 0.2601 340 0.38086 0.260 0.501 6 0.501

15

Page 6: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Plot of the estimated curvesExample of the GPLOT procedure

PROC GPLOT DATA=km;PLOT curve*day=type

/ HAXIS=AXIS1 VAXIS=AXIS2;

SYMBOL1 R=1 V=NONE I=STEPLJ L=1 W=2 C=BLACK;SYMBOL2 R=2 V=NONE I=STEPLJ L=33 C=BLACK;SYMBOL3 R=1 V=NONE I=STEPLJ L=1 W=2 C=GRAYAA;SYMBOL4 R=2 V=NONE I=STEPLJ L=35 C=BLACK;

AXIS1 LABEL = ('Days from randomization');AXIS2 LABEL = (A=90 R=0

'Kaplan-Meier curves');RUN;

16

Log rank testGroups may be compared using a log rank test.Principle (for 2 groups):We assume (�the null hypothesis�), that there isno di�erence between the 2 groups andcondition for each �time of death� by

• the observed number of �deaths� in totalm(ti) (= m1(ti) + m2(ti))

• the number of individuals (presently) atrisk in each of the groups n1(ti) andn2(ti)(= n(ti))

For group 1 we then calculate, for each �time ofdeath� ti

• the expected number of �deaths�E1(ti) = m(ti) · n1(ti)

n(ti)

• the variance of the number of �deaths�V1(ti) = n1(ti)n2(ti)·m(ti)(n(ti)−m(ti))

(n(ti))2(n(ti)−1)

17

Log rank testThe expected number of �deaths� E1(ti) andthe variance V1(ti) are added for all �deathtimes� ti to give E1 and V1, respectively.Furthermore, we count the total number of�deaths� O1 in group 1.The log rank test statistic

χ2log rank =

(O1 − E1)2

V1,

is χ2-distributed with 1 degree of freedom.

The result does not depend on whichgroup we decide to label 1!

Approximation which may also be applied formore than 2 groups, here G groups:

χ2log rank ≈

G∑g=1

(Og − Eg)2

Eg,

which is χ2-distributed with G− 1 degrees offreedom (note that all groups contribute).

18

Page 7: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Log rank test using PROC PHREG

Calculation of log rank test statistic as ascore test using PROC PHREG:

PROC PHREG DATA=scl;MODEL day*bld(0) = sclero

/ TIES=DISCRETE;RUN;

Here we compare the group with SCLERO=1(the sclerotherapy group) to the group withSCLERO=0 (the medically treated group).If we have more groups, we need a 0-1 variablefor each group except one, which then becomesthe �reference group� (the log rank test doesnot depend on the choice of reference group)

19

Output from PROC PHREG

The PHREG ProcedureModel Information

Data Set WORK.SCLDependent Variable dayCensoring Variable bldCensoring Value(s) 0Ties Handling DISCRETESummary of the Number of Event and Censored Values

PercentTotal Event Censored Censored

187 91 96 51.34

Model Fit StatisticsWithout With

Criterion Covariates Covariates-2 LOG L 738.406 737.488: : :

Testing Global Null Hypothesis: BETA=0Test Chi-Square DF Pr > ChiSqLikelihood Ratio 0.9175 1 0.3381Score 0.9174 1 0.3382Wald 0.9124 1 0.3395

20

Output from PROC PHREG continued

Analysis of Maximum Likelihood EstimatesParameter Standard

Variable DF Estimate Error Chi-Sq. Pr>ChiSqsclero 1 -0.20261 0.21212 0.9124 0.3395

Analysis of Maximum Likelihood EstimatesHazard

Variable Ratiosclero 0.817

The log rank test is not suitable for detectingtime-dependent di�erences like a bettershort-term prognosis for one group and a betterlong-term prognosis for the other group.

21

Page 8: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Proportional hazards(hazard=instantaneous rate)

Quanti�cation of treatment e�ect:r(t; sclero) = r(t;medical) ·B

E�ect of ascites:r(t; ascites) = r(t;without ascites) ·A

Combined:r(t; sclero, ascites) = r(t;medical, ascites) ·B

= r(t;medical, without ascites) ·A ·B= r(t;medical, without ascites) · ea+b

with a=ln(A) and b=ln(B).

Set X1 = {0∼ without ascites1∼ with ascites

and X2 = {0∼ medical alone1∼ sclerotherapy

thenr(t; sclero, ascites) = r(t; X1 = 1, X2 = 1)

= r0(t) · eaX1+bX2

22

Cox's regression model

This model is denoted Cox's regression model,generally formulated:

r(t; X1, X2, . . . , Xk) = r0(t)·eb1X1+b2X2+...+bkXk

If we log-transform and use a(t) for log(r0(t)),we get something that looks more like otherregression models:log(r(t;X1, X2, . . . , Xk))

= a(t) + b1X1 + b2X2 + . . . + bkXk

The covariates X1, X2, . . . , Xk may becontinuous like serum bilirubin.A positive value of bj means that large valuesof the covariate Xj increases the rate: Forunwanted events, large values worsen theprognosis (be cautious with �positive/negativee�ect�, use, e.g., bene�cial/harmful).

23

Example with several covariatesPROC PHREG DATA=scl;

MODEL day*bld(0) = ascites bilirub sclero/ RISKLIMITS;

RUN;-----------------------------------------------Summary of the Number of Event and Censored Values

PercentTotal Event Censored Censored

177 87 90 50.85

Parameter StandardVariable DF Estimate Error Chi-Sq. Pr>ChiSqascites 1 0.18072 0.22721 0.6326 0.4264bilirub 1 0.00476 0.00112 18.1500 <.0001sclero 1 -0.21924 0.21801 1.0113 0.3146

Hazard 95% Hazard RatioVariable Ratio Confidence Limitsascites 1.198 0.768 1.870bilirub 1.005 1.003 1.007sclero 0.803 0.524 1.231

24

Page 9: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Test of covariates in the Cox model

Wald's test:

Wald's test of a single covariate appears in theoutput, like p=.4264 for ascites.

Wald's test for several covariatessimultaneously may be performed by adding aTEST-statement, for instance:PROC PHREG DATA=scl;MODEL day*bld(0) = ascites bilirub sclero

/ RISKLIMITS;

Asc_bili: TEST ascites=0, bilirub=0;

RUN;which gives some extra lines of output:Linear Hypotheses Testing Results

WaldLabel Chi-Square DF Pr > ChiSqAsc_bili 21.2800 2 <.0001

25

Explanatory variables in event-time analyses

A variable may enter the model in two verydi�erent ways:

• as a covariate� continuous variables may enter in the

usual fashion� for categorical variables we have to

supply dummy variables to be used ascovariates in PROC PHREG

• as a strati�cation variable

These possibilities are basically di�erent andthe choice has consequences for the modellingas well as the interpretation of the variable inquestion and for the estimation of the e�ect ofother variables.

26

Estimation in a Cox modelFor the particular day on which patient j

rebleeds, we calculate the probability that thishappens precisely for this patient j, given thata rebleeding occurs among the patients in thestratum where patient j belongs:

exp(bXj)∑i in <j

exp(bXi)

where <j denotes all those patients (the i's)who were at risk of rebleeding in the samestratum as j, when j started to rebleed.

These contributions are multiplied together forall rebleeding time points, and b is estimatedby the value, b̂, which maximizes this totalproduct called �Cox's partial likelihood�.

If there are more than one patient rebleedingon the same day, we have ties in the data. Tiesmay be handled in several ways: TIES=EXACT(the correct method) TIES=DISCRETE (as in thelog rank test), or TIES=BRESLOW (the quickestand SAS's standard)

27

Page 10: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Explanatory variables in event-time analysesCovariates enter only in the exponent, thus therates are assumed proportional for di�erentvalues of X:

r(t;X = 1) = r0(t) · eb and

r(t; X = 2) = r(t; X = 1) · eb = r0(t) · e2b

Consequences:1. the e�ect is described using a single number2. but this quantity can only be interpreted if

the assumption of proportional rates holds(approximately)

For strati�cation variables we let theunderlying rate depend upon the value of thevariable, thus the di�erence betweenindividuals with X = 1 and individuals withX = 2 may change over time:

r(t; X = 1) = r1(t) and r(t; X = 2) = r2(t)

Consequences:1. we do not get a simple measure of the e�ect2. the strati�cation variable(s) must be

categorical with only few values

28

InteractionsStrati�cation must not mistaken forinteraction! The e�ects of the remainingvariables are assumed to be identical in thedi�erent strata � in contrast to theepidemiological use of the term �strati�edanalyses�!!

Interaction means that the e�ect of onevariable, e.g., bilirubin, depends on the value ofanother variable, e.g., the treatment. We thenhave to estimate di�erent associations withbilirubin in the two treatment groups. Thisrequires dummy variables!Some SAS-procedures can make these dummyvariables automatically (CLASS and '*'), but PHREGcannot (yet). It has to be done in a DATA-step:

DATA scl; SET scl;IF sclero=1 THEN DO;

scl_bili=bilirub; med_bili=0;END;IF sclero=0 THEN DO;

scl_bili=0; med_bili=bilirub;END;

RUN;

29

Interactions cont.

As always, the interpretation of parameterestimates depends on which other covariatesare included in the model:

SCL_BILI together with MED_BILI:Variable Parm.Est. Std.Err. Chi-Sq. Pr>ChiSqsclero -0.08500 0.26087 0.1062 0.7445med_bili 0.00578 0.00146 15.6066 <.0001scl_bili 0.00423 0.00155 7.4938 0.0062Here, we estimate separate linear relations inthe two treatment groups. The SCLEROcovariate must be included, otherwise thebilirubin-relations are forced to meet in 0 (nodi�erence between the treatment groups forbilirubin=0).

SCL_BILI together with BILIRUB:Variable Parm.Est. Std.Err. Chi-Sq. Pr>ChiSqsclero -0.08500 0.26087 0.1062 0.7445bilirub 0.00578 0.00146 15.6066 <.0001scl_bili -0.00155 0.00207 0.5621 0.4534Here, SCL_BILI estimate the di�erence betweenthe e�ects of BILIRUB in the two groups! Thisversion may be used to test the interaction.

30

Page 11: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Need for transformationof explanatory variables

Criteria for choice ofparametrisation/transformation• Biologic/medical justi�cation (best, but

most often not possible). The rate increasesexponentially with untransformedcovariates, while a logarithmictransformation of a covariate means thatthe rate is increased by a �xed factorwhenever the covariate increases with e.g.10%.

• Transformations used by others(comparability).

• The �best possible� transformation of thepresent data � take care, when evaluatingthe signi�cance and interpreting the e�ect(the signi�cance will be exaggerated, donot put too much emphasis on the chosentransformation).

31

Need for transformationof explanatory variables

Criteria for choice ofparametrisation/transformation, cont.

• A few extreme values of the explanatoryvariable may have to much in�uence on theresult unless the variable is transformed[a few extremely large → log2(x), a fewextremely small → exp(x/c) (rarely used)]

Trick:By choosing log2 we get eb̂ (Hazard Ratio) toestimate the factor by which the rate ismultiplied for every time x is doubled, sincelog2(x) + 1 = log2(x) + log2(2) = log2(2 · x). Iflog2 is not directly available (it is in SAS), itmay be calculated as log2(x) = log(x)/ log(2).Likewise, we may estimate the factorcorresponding to a 10% increase directly byusing XX = log(x)/ log(1.1) as the covariate.

32

Transformation of serum bilirubinPROC UNIVARIATE DATA=scl PLOT;VAR bilirub;

RUN;-----------------------------------------------

Histogram # Boxplot525+* 1 *

.* 3 *

.* 1 *

.* 3 *

.275+

.* 3 *

.** 6 0

.*** 7 0

.**************** 46 +--+--+25+************************************ 107 *-----*

----+----+----+----+----+----+----+-* may represent up to 3 counts

33

Page 12: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Transformation of serum bilirubinDATA scl; SET scl; log2bili=LOG2(bilirub); RUN;

PROC UNIVARIATE DATA=scl PLOT;VAR log2bili;

RUN;-----------------------------------------------Stem Leaf # Boxplot

9 01 2 08 566789 6 087 556779 6 |7 1222334 7 |6 55566899 8 |6 00000000001111122233344444 26 +-----+5 55556666666666667777888889999 29 | + |5 00000000000011222222222333444444444 35 *-----*4 55556666777777788899999 23 +-----+4 00011122233334444 17 |3 66778888999 11 |3 0003 4 |2 6 1 02 03 2 0

----+----+----+----+----+----+----+

34

Need for transformationof explanatory variables

Simple numerical evaluations:

• De�ne the squared term X2=X**2; andinclude both X and X2 in order to testwhether X2 gives a signi�cant improvement(test for curvature/linearity, although notvery powerful).

• Include both the untransformed and thetransformed variable simultaneously to seewhether there is a clear-cut answer as towhich is the better predictor (requires areasonable alternative).

Graphical evaluation with correspondingtest:

Linear splines (Greenland 1995, Epidemiology,p. 356-365)

35

Plot of a linear spline

36

Page 13: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Construction of the linear spline

37

Plot of the covariates needed

38

Plot of the covariates needed

39

Page 14: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Coding in SAS

40

Estimation and test of linear splineQuartiles among rebleeders: 26, 47, 73(PROC UNIVARIATE PCTLDEF=3; WHERE bld=1;VAR bilirub; RUN;)

Extra variables:

DATA scl; SET scl;IF bilirub NE . THEN DO;

b_u26=MIN(bilirub-26,0);b_o26=MAX(bilirub-26,0);b_o47=MAX(bilirub-47,0);b_26_47=b_o26-b_o47;b_o73=MAX(bilirub-73,0);b_47_73=b_o47-b_o73;

END;RUN;

are included in the model and tested:

PROC PHREG DATA=scl;MODEL day*bld(0) =

b_u26 b_26_47 b_47_73 b_o73 sclero/ RISKLIMITS;

Testline: TEST b_u26=b_26_47=b_47_73=b_o73;RUN;

41

Estimation and test of linear spline cont.

Parameter StandardVariable DF Estimate Error Chi-Sq. Pr>ChiSqb_u26 1 -0.01390 0.03276 0.1801 0.6712b_26_47 1 0.03250 0.02161 2.2618 0.1326b_47_73 1 0.03483 0.01412 6.0838 0.0136b_o73 1 0.00162 0.00161 1.0189 0.3128sclero 1 -0.13197 0.21954 0.3613 0.5478

Hazard 95% Hazard RatioVariable Ratio Confidence Limitsb_u26 0.986 0.925 1.052b_26_47 1.033 0.990 1.078b_47_73 1.035 1.007 1.065b_o73 1.002 0.998 1.005sclero 0.876 0.570 1.348

Linear Hypotheses Testing ResultsLabel Wald Chi-Square DF Pr > ChiSqTestline 15.7811 3 0.0013

�Parameter Estimate� are the slopes for ln(rateratio) within each of the intervals. �HazardRatio� is therefore a measure of theinterval-speci�c dose-response relations.

42

Page 15: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Plot of linear spline

5th and 95th percentiles among rebleeders: 12and 177

DATA plot;DO bili=12, 26, 47, 73, 177;

b_u26=MIN(bili-26,0);b_o26=MAX(bili-26,0);b_o47=MAX(bili-47,0);b_26_47=b_o26-b_o47;b_o73=MAX(bili-73,0);b_47_73=b_o47-b_o73;pi = -0.01390*b_u26 +0.03250*b_26_47

+0.03483*b_47_73 +0.00162*b_o73;rr = EXP(pi);OUTPUT;

END;RUN;

43

Plot of linear spline

SYMBOL1 V=CIRCLE I=JOIN L=1 C=BLACK;AXIS1 LABEL=(F=CENTX 'Bilirubin') ;AXIS2

LABEL = (F=CENTX A=90 R=0 'Rate ratio')LOGBASE=2 MINOR=(N=3) ;

PROC GPLOT DATA=plot;PLOT rr*bili

/ HAXIS=AXIS1 VAXIS=AXIS2;RUN;

44

Transformation of serum bilirubinInclusion of serum bilirubin untransformed aswell as transformed by the logarithm:

PROC PHREG DATA=scl;MODEL day*bld(0) = sclero bilirub log2bili;

RUN;-----------------------------------------------

Parameter StandardVariable DF Estimate Error Chi-Sq. Pr>ChiSqbilirub 1 -0.0001959 0.00231 0.0072 0.9325log2bili 1 0.48004 0.18152 6.9939 0.0082sclero 1 -0.18290 0.21596 0.7172 0.3971

The bilirubin-related estimates cannot bereadily interpreted (�change when doubling ofbilirubin for �xed value of bilirubin . . . �). Ifthey are both signi�cant, then the conclusion isbest illustrated in a graph.

45

Page 16: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Estimation with log2(serum bilirubin)PROC PHREG DATA=scl;

MODEL day*bld(0) = sclero log2bili/ RISKLIMITS;

RUN;-----------------------------------------------

Parameter StandardVariable DF Estimate Error Chi-Sq. Pr>ChiSqsclero 1 -0.18373 0.21575 0.7252 0.3944log2bili 1 0.46716 0.09706 23.1656 <.0001

Analysis of Maximum Likelihood EstimatesHazard 95% Hazard Ratio

Variable Ratio Confidence Limitssclero 0.832 0.545 1.270log2bili 1.595 1.319 1.930

The e�ect of serum bilirubin: a twice as largeserum bilirubin value corresponds to approx.60% increased rate of rebleeding

46

Model control in the Cox modelThe Cox model is based on the assumption ofproportional rates, so R(t; X) = R0(t)ebX and

ln(R(t;X)) = ln(R0(t)) + bX

Graphical check of proportional rates: Stratifyfor each variable separately and plot

ln(Rstratum(t)) = ln(− ln(Sstratum(t))),

the curves should be approximately parallel.

TITLE 'Graphical check of proportionality';PROC PHREG DATA=scl NOPRINT;MODEL day*bld(0) = log2bili;STRATA sclero;BASELINE OUT=check

LOGLOGS=lcumrate/ METHOD=CH;

RUN;

47

Plot of log(cumulative rates)

Example of code for GPLOT

SYMBOL1 V=NONE I=STEPLJ L=1 C=BLACK;SYMBOL2 V=NONE I=STEPLJ L=1 C=GRAYAA;

AXIS1LABEL=(F=CENTX 'Days from randomization')LOGBASE=7MINOR=(N=5)

;AXIS2

LABEL = (F=CENTX A=90 R=0);

PROC GPLOT DATA=check; WHERE 0<day<=343;PLOT lcumrate*day=sclero

/ HAXIS=AXIS1 VAXIS=AXIS2 NOLEGEND;RUN;

48

Page 17: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Plot of log(cumulative rates)for graphical check of proportionality

49

Numerical test of proportionalityusing time-dependent variables

Choose appropriate time points (here 14 and105 days), allow for di�erent proportionalityfactors in each time interval through the use ofdummy variables (here SCLFR14 and SCLFR105)and test, whether they are signi�cant:

PROC PHREG DATA=scl;MODEL day*bld(0) = log2bili

sclero sclfr14 sclfr105;IF sclero=1 AND day>=14 THEN sclfr14=1;

ELSE sclfr14=0;IF sclero=1 AND day>=105 THEN sclfr105=1;

ELSE sclfr105=0;Testprop: TEST sclfr14, sclfr105;

RUN;

The variables inside PHREG are calculated foreach rebleeding time point for all patients atrisk at that particular time. The time variable,here DAY, is equal to the rebleeding time pointfor the patient in the numerator (�j�, slide 27),while all other variables refer to the currentpatient in the denominator (�i�, slide 27).

50

Part of output from PROC PHREG

Parameter StandardVariable Estimate Error Chi-Sq. Pr>ChiSqlog2bili 0.45833 0.09654 22.5398 <.0001sclero 0.11316 0.31284 0.1308 0.7176sclfr14 -0.49295 0.48226 1.0448 0.3067sclfr105 -0.30190 0.67860 0.1979 0.6564

Linear Hypotheses Testing ResultsWald

Label Chi-Square DF Pr > ChiSqTestprop 1.9445 2 0.3782

51

Page 18: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Time-dependent treatment e�ectWould we expect a time-dependent e�ect?

PROC PHREG DATA=scl;MODEL day*bld(0) = sclero scl_ltid log2bili;IF sclero=1 THEN scl_ltid=LOG2(day+(day=0));

ELSE scl_ltid=0;RUN;-----------------------------------------------

Parameter StandardVariable Estimate Error Chi-Sq. Pr>ChiSqsclero 0.10443 0.46763 0.0499 0.8233scl_ltid -0.06675 0.09608 0.4826 0.4872log2bili 0.46349 0.09679 22.9323 <.0001-----------------------------------------------

Omitting SCLERO gives us

Parameter StandardVariable Estimate Error Chi-Sq. Pr>ChiSqscl_ltid -0.04774 0.04447 1.1526 0.2830log2bili 0.46488 0.09675 23.0883 <.0001

52

Time scales

Examples of time scales• age

• calendar time

• time since beginning of a disease

• time from some other event of greatimportance for the rate (here time fromtermination of latest bleeding)

• time from randomization (oftenproblematic)

• (pseudo)time from operation (veryproblematic if the comparison group hasnot been operated)

The only di�erence for the single individual isthe de�nition of time=0, but it may make a bigdi�erence for the results, because it has anin�uence on which individuals that areconsidered �at risk� when something happens.

53

Di�erent timescalesTime since randomisation

54

Page 19: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Di�erent timescalesTime since cessation of �rst bleeding

55

Choice of time scale

Choose a relevant time scale!• The advantage of the Cox model is that it

allows for an unspeci�ed relation betweenthe rate and the underlying time scale.

• The ratio between the rates for any twopatients at any particular time point is onlyallowed to depend upon the covariates.

• Characteristic of a relevant time scale:There must be a good reason to assumethat time since time=0 has a large (and�identical�) e�ect on the rate for allpatients � otherwise a constant underlyingrate is the only meaningful possibility, andin that case, the data can be better utilizedby performing a Poisson regression.

Other time scales may enter as covariates inthe Cox model. If the dependence on anothertime scale cannot be assumed to follow thepattern �one year more always means the samething�, then you must use time-dependentcovariates or stratify.

56

Delayed entry

Reason: The individuals must experience aspeci�c event before they are at risk in thestudy, and this happens at di�erent time pointsfor the di�erent individuals.

Examples:• for some patients the randomization is

performed later than time=0 for the chosentime scale (in the example some patientsare randomized several days after thetermination of the �rst bleeding)

• some covariates require a specialexamination and some of the patients haveto wait for this examination

• to be included, the patients must be aliveand �well� at the start of the study

• cancer among siblings of children withcancer: siblings can enter the study onlyfrom the age they had when the proband(the child who was diagnosed with cancer�rst) got the cancer diagnosis

57

Page 20: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Delayed entry in SASPROC PHREG DATA=scl;

MODEL tnotbld*bld(0) = log2bili sclero/ ENTRYTIME=t_entry

RISKLIMITS;RUN;-----------------------------------------------

Model InformationData Set WORK.SCLEntry Time Variable t_entryDependent Variable tnotbldCensoring Variable bldCensoring Value(s) 0Ties Handling BRESLOW

PercentTotal Event Censored Censored

149 86 63 42.28:

Analysis of Maximum Likelihood EstimatesParameter Standard

Variable Estimate Error Chi-Sq. Pr>ChiSqlog2bili 0.43431 0.09580 20.5534 <.0001sclero -0.16470 0.21682 0.5770 0.4475

58

Competing risks

Patients may exit for several reasons, hererebleeding or death.Consequences:

• Technical: Endpoints other than the eventin focus are treated as censorings

• Interpretation: The rate, and thereforealso the estimated e�ects have the sameinterpretations as before � BUT

• the Kaplan-Meier curve cannot beinterpreted as the probability of avoidingthe event in focus

If several types of events are of interest, theneach type of event must be analysed separatelytreating the other types of events as censorings.

59

Separate analyses of the two events

PROC PHREG DATA=scl;MODEL tnotbld*bld(0)=sclero log2bili ascites

/ ENTRYTIME=t_entry;RUN;

Parameter StandardVariable DF Estimate Error Chi-Sq. Pr>ChiSqsclero 1 -0.19124 0.22021 0.7542 0.3851log2bili 1 0.42240 0.09677 19.0542 <.0001ascites 1 0.15762 0.22776 0.4789 0.4889-----------------------------------------------PROC PHREG DATA=scl;MODEL tnotbld*dead(0)=sclero log2bili ascites

/ ENTRYTIME=t_entry;RUN;

Parameter StandardVariable DF Estimate Error Chi-Sq. Pr>ChiSqsclero 1 0.17358 0.35173 0.2435 0.6217log2bili 1 0.50353 0.14482 12.0890 0.0005ascites 1 0.93763 0.38166 6.0354 0.0140

60

Page 21: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

The probability of beingalive without rebleeding

If the various events are combined, we get anassessment of the e�ects of the covariates onthe time until the �rst of these events occur:

DATA scl; SET scl;status = bld + 2*dead;

RUN;PROC PHREG DATA=scl;

MODEL tnotbld*status(0) =sclero log2bili ascites/ ENTRYTIME=t_entry;

RUN;-----------------------------------------------

Parameter StandardVariable DF Estimate Error Chi-Sq. Pr>ChiSqsclero 1 -0.08715 0.18555 0.2206 0.6386log2bili 1 0.44557 0.08044 30.6819 <.0001ascites 1 0.37333 0.19303 3.7405 0.0531

61

Competing events

The probability of rebleeding at a particulartime t equals

r(t;X) · S(t; X)

where S(t; X) is the probability of being aliveat time t without having experiencedrebleeding yet.

Consequence:

• Factors that do not a�ect the rate for aparticular event may, even so, have ane�ect on the probability of experiencingthe event through the in�uence on the ratefor a competing event and thereby on theprobability of being at risk.

62

Comparison to dichotomous response

• a dichotomous response is only concernedwith status at a particular time(dead/alive; diseased/healthy):1. you do not need to know the precise

time for occurrence of death/illness; youonly focus on the speci�c time pointand register whether the event hashappened (yet)

2. the result will depend upon the chosentime point

3. the comparison between studies may beproblematic if the study periods are toodi�erent

63

Page 22: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Comparison to dichotomous responses cont.

4. it is impossible to utilize any knowledgeabout the order of events (like whetheror not the untreated died before thetreated), so there is less power to detectpossible e�ects

5. patients who leave the study early forother reasons (the censorings) cannot beused in the analyses, since we do notknow whether or not they died/got sickafter they left, but before the end of thestudy period

• if there are competing events, there may beproblems with the de�nition as well as theinterpretation of dichotomized data

64

Comparison to dichotomous responses cont.

• modelling of the probability instead of therate can make it harder to evaluate thee�ect of age: For a speci�c individual, theprobability of having experienced a certainevent will always increase with age, eventhough the event is most common amongthe very young

• if we have proportional rates, the rate ratiowill be further away from the neutral valueof 1 than the odds ratio (from logisticregression) and the risk ratio (from alog-linear model) (rate ratio ≤ OR ≤ riskratio ≤ 1 or rate ratio ≥ OR ≥ risk ratio ≥1).

If we know the order (on the relevant timescale) of the events in focus, then theevent-time analysis is preferable.

65

Individually matched case-control designsIf we match individually in case-controldesigns, the analysis should be performed usingconditional logistic regression. This is donethrough the use of PROC PHREG strati�ed by thevariable which denotes the case-controlpair/group, and using a dummy time variable:

DATA matched; SET rawdata;if case=1 then dum_time=1; * cases ;if case=0 then dum_time=2; * controls ;

RUN;PROC PHREG DATA=matched NOSUMMARY;MODEL dum_time*case(0)=exposure;

STRATA matchgrp;RUN;

Here, the variable dum_time is set to 1 for casesand 2 for controls to ensure, that the �eventtime� for the controls is later than the �eventtime� for the case. NOSUMMARY is not necessary,but may be included if a print-out of number ofcases (�Events�) and number of controls(�Censored�) for each single matched pair is notneeded.

66

Page 23: 2 Survival analysis - kustaff.pubhealth.ku.dk/~lts/varians_regression/overheads/backup/survival3.pdf · linear cumulative rate (and a Poisson model would make better use of the available

Interpretation in a matched case-control designThe interpretation of the parameter estimatedepends upon the choice of matching variables!

• If there is not matched on a meaningfultime scale, then only the usual odds ratiointerpretation is possible.

• In the sampling design �Incidencedensity sampling� (epidemiologists) or�Nested case-control within a cohort�(statisticians), you choose for each case oneor more controls who are at risk at theexact time when the case got sick/died(but the controls may get sick/dieimmediately after). Here the odds ratioestimates the same rate ratio as a Coxregression analysis with the time scale thatis used to de�ne the �exact� time.

67