#StackBounty: #r #machine-learning #mathematical-statistics #survival Accelerated Failure Time Regression Performance (Survival Analysis)

Bounty: 100

Issue is around high intercept with AFT Regression. Let me explain below:

Suppose you are modelling the time to an event via an Accelerated Failure Time Regression i.e. given survival time $T$, suppose we have observed values of covariates $x_{i1}, …, x_{ip}$ and possibly censored survival time $t_i$, then:
$$ log(t_i) = beta_0 + beta_1 x_{i1} + … + beta_p x_{ip}+ sigma epsilon_i $$

Suppose we are looking at a Weibull AFT i.e. where $epsilon_i $ are IID according to a Gumbel Distribution (Extreme Value Type 1).

You are looking at the case of time varying covariates (assume just one for now) e.g. you have a dataset like the following example with a single time dependent covariate (TDC_1). Where Start is the enter time (period start) and End is the period end (exit time) and UNIT_ID is the ID for the entity in the study:

0     1   0     1       0.1 
1     2   0     1       0.2
2     3   0     1       0.3
19    20  1     1       1.9
0     1   0     2       0.1
1     2   0     2       0.2
2     3   0     2       0.3
19    20  1     2       1.9

With the aftreg function from the eha library in R you can construct a Weibull AFT e.g.

model <- aftreg(Surv(START, END, EVENT) ~ TDC_1, dist="weibull", data=df, id=UNIT_ID, param='lifeExp')

Calling model.coefficients gives:

TDC_1        -0.905
log(scale)    9.393
log(shape)    0.046

The expected time to event when $T$ follows a Weibull distrubtion is given by:
$$E(T|X_i) = exp left( beta_0 + x_i beta_1 right)Gamma(1 + sigma) = exp left( 9.393 – 0.905*TDC_1 right)*0.98 $$

As $beta_0 = log(scale)$ and $sigma = frac{1}{exp(log(shape))}$

My question is around these parameter estimates (in particular the the intercept term ($beta_0 = log(scale)$). No matter how I change the error term parameterisation e.g. if $epsilon_i$ are distributed normally (then $T$ lognormal) or if $epsilon_i$ ~ Logistic etc, the intercept is exceptionally high and appears not to be optimal in terms of minimising error on time to event.

For example if I manually subtract 2 from the intercept (9.393 – 2) I can reduce the root mean squared error on the time to event on the dataset fit:.

9.393     776 days
7.393     97 days

Here TIME_TO_EVENT_RMSE is calculated as (with a dataset that only contains non-censored events):

$$ RMSE = sqrt{sum_{i}^{n} frac{(exp left( beta_0 + x_i beta_1 right)Gamma(1 + sigma) – t_i)^2}{n}} $$

For further illustration, suppose you model directly using exponential regression (i.e. linear regression and logging the target variable) with exactly the same dataset (only using non-censored events so the two are comparable). I know they are minimising different loss functions and aren’t directly comparable, but just for illustration purposes:

19             1      0.1 
18             1      0.2
17             1      0.3

Here we have:

$$E(T|X_i) = exp left( beta_0 + x_i beta_1 right) = exp left( 8.03 – 0.5*x_i right) $$

I know that AFT Regression is not directly minimising RMSE, and that with the AFT regression the TDC_1 coefficient magnitude is larger in addition to a larger intercept, however with the intercept as high as it is, the model isn’t particularly useful (significantly over-predicting the time to event).


  1. Has anyone experienced this before and have any advice on how to improve the AFT model?
  2. Is there anyway to fix the scale with time varying covariates in AFTRegression?

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.