*Bounty: 100*

*Bounty: 100*

I am trying to solve a survival analysis problem, where all data are either left-censoring or right-censoring. I use an objective function which contains the CDF of Gumbel distribution.

I have $m$ features and $m+1$ coefficients which need to be learned. The scale of the distribution, $lambda$ can be represented by a linear regression. Since the scale must be a positive number, I use softplus. (I think an exp transformation may be easy to go unlimited.)

$lambda = softplus(theta_0 + sum_{j=1}^{m} theta_jx_j ) = ln[ 1+exp(theta_0 + sum_{j=1}^{m} theta_jx_j) ]$

The scale is fed into a Gumbel distribution.

$h(t) = e^{-e^{-(t-mu)/lambda}}$, where the location $mu$ is pre-specified.

$h(t)$ is the probability that the patient is dead before time $t$, i.e., left-censoring. $1 – h(t)$ is the probability that the patient is dead after $t$, i.e., right-censoring.

In the ground truth, binary target, $y^{(i)}$, is whether the patient is left-censoring. As the model outputs how likely the patient is left censoring at $t$, I use log-loss to measure the loss of the model.

I use Tensorflow to implement the model:

```
input_vectors = tf.placeholder(tf.float32,
shape=[None, num_features],
name='input_vectors')
time = tf.placeholder(tf.float32, shape=[None], name='time')
event = tf.placeholder(tf.int32, shape=[None], name='event')
weights = tf.Variable(tf.truncated_normal(shape=(num_features, 1), mean=0.0, stddev=0.02))
scale = tf.nn.softplus(self.regression(input_vectors, weights))
'''
if event == 0, right-censoring
if event == 1, left-censoring
'''
not_survival_proba = self.distribution.left_censoring(time, scale) # the left area
logloss = tf.losses.log_loss(labels=event, predictions=not_survival_proba)
```

The implementation of the Gumbel distribution:

```
class GumbelDistribution:
def __init__(self, location=1.0):
self.location = location
def left_censoring(self, time, scale):
return tf.exp(-1 * tf.exp(time - self.location) / scale)
def right_censoring(self, time, scale):
return 1 - self.left_censoring(time, scale)
```

However, the batch loss becomes NaN after several iteration. After I change the distribution to Weibull. It works. So I guess the problem is the two $exp$s in the CDF of Gumbel.

```
Epoch 1 - Batch 1/99693: batch loss = 16.3606
Epoch 1 - Batch 2/99693: batch loss = 25.5445
Epoch 1 - Batch 3/99693: batch loss = 17.1181
Epoch 1 - Batch 4/99693: batch loss = 10.6815
Epoch 1 - Batch 5/99693: batch loss = 17.2127
Epoch 1 - Batch 6/99693: batch loss = 28.7549
Epoch 1 - Batch 7/99693: batch loss = 13.8332
Epoch 1 - Batch 8/99693: batch loss = 19.3377
Epoch 1 - Batch 9/99693: batch loss = 19.7385
Epoch 1 - Batch 10/99693: batch loss = 17.7479
Epoch 1 - Batch 11/99693: batch loss = 13.1403
Epoch 1 - Batch 12/99693: batch loss = 15.0979
Epoch 1 - Batch 13/99693: batch loss = 17.5434
Epoch 1 - Batch 14/99693: batch loss = 21.5072
Epoch 1 - Batch 15/99693: batch loss = 10.4660
Epoch 1 - Batch 16/99693: batch loss = 26.9554
Epoch 1 - Batch 17/99693: batch loss = nan
Epoch 1 - Batch 18/99693: batch loss = nan
Epoch 1 - Batch 19/99693: batch loss = nan
Epoch 1 - Batch 20/99693: batch loss = nan
Epoch 1 - Batch 21/99693: batch loss = nan
Epoch 1 - Batch 22/99693: batch loss = nan
Epoch 1 - Batch 23/99693: batch loss = nan
```

Any idea how to solve this problem?