#StackBounty: #r #survival-analysis #survival How can I avoid right-truncated subjects being dropped?

Bounty: 50

I’m doing a survival analysis about the time some individual components remain in the source code of a software project, but some of these components are being dropped by the survfit function.

This is what I’m doing:

library(survival)
data <- read.table(text = "component_id weeks removed
1              1       1
2              1       1
3              1       1
4              1       1
5              1       1
6              1       1
7              1       1
8              2       0
9              2       0
10              2       0
11              2       0
12              2       1
13              2       1
14              2       0
15              2       0
16              2       0
17              2       0
18              2       0
19              2       0
20              2       1
21              2       1
22              2       0
23              2       0
24              3       1
25              3       1
26              3       1
27              3       1
28              7       1
29              7       1
30             14       1
31             14       1
32             14       1
33             14       1
34             14       1
35             14       1
36             14       1
37             14       1
38             14       1
39             14       1
40             14       1
41             14       1
42             14       1
43             14       1
44             14       1
45             14       1
46             14       1
47             14       1
48             40       1
49             40       1
50             40       1
51             40       1
52             48       1
53             48       1
54             48       1
55             48       1
56             48       1
57             48       1
58             48       1
59             48       1
60             56       1
61             56       1
62             56       1
63             56       1
64             56       1
65             56       1
66             56       1
67             56       1
68             56       1
69             56       1", header = TRUE)

fit <- survfit(Surv(data$weeks, data$removed) ~ 1)
summary(fit, censored=TRUE)

And this is the output

Call: survfit(formula = Surv(data$weeks, data$removed) ~ 1)

time n.risk n.event survival std.err lower 95% CI upper 95% CI
   1     69       7    0.899  0.0363        0.830        0.973
   2     62       4    0.841  0.0441        0.758        0.932
   3     46       4    0.767  0.0533        0.670        0.879
   7     42       2    0.731  0.0567        0.628        0.851
  14     40      18    0.402  0.0654        0.292        0.553
  40     22       4    0.329  0.0629        0.226        0.478
  48     18       8    0.183  0.0520        0.105        0.319
  56     10      10    0.000     NaN           NA           NA

I was expecting the number of events to be 69 but I get 12 subjects dropped.

I initially thought I was misusing the package functions, and carried a type="interval2" approach, following a similar situation, but the drops keep happening with now a weird continuous number of subjects and events counts:

as.t2 <- function(i, data) if (data$removed[i] == 1) data$weeks[i] else NA
size  <- length(data$weeks)
t1    <- data$weeks
t2    <- sapply(1:size, as.t2, data = data)
interval_fit <- survfit(Surv(t1, t2, type="interval2") ~ 1)
summary(interval_fit, censored=TRUE)

Next, I found what I call a mid-air explanation, clarifying a bit further the situation. I understand this is caused by non-censored subjects appearing after a “constant censoring time”, but again, why?

That led me somehow to dig deeper and read about right-truncation and realized that type of studies mapped very closely to the drops I’m experiencing. Here’s Klein & Moeschberger:

Truncation of survival data occurs when only those individuals whose event time lies within a certain observational window $(Y_L,Y_R)$ are observed. An individual whose event time is not in this interval is not observed and no information on this subject is available to the investigator.

Right truncation occurs when $Y_L$ is equal to zero. That is, we observe the survival time $X$ only when $X leq Y_R$.

From my perspective, these drops carry important information for my study regardless of their time of entry.

How can I stop the drops?


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.