#StackBounty: #regression #bayesian #multiple-regression #censoring #rstan How best to deal with a left-censored predictor (because of …

Bounty: 200

Context: I’m new to Bayesian stats and am trying to fit a multiple regression with rstan. All variables are continuous and there is no hierarchical structure.

One of my predictors is left-censored because it falls below the detection limit for a chemical assay. What is the best way to deal with this in a multiple regression? So far, I can see a few possibilities:

  1. A substitution rule, such as ‘replace all values below the detection limit with a constant such as detection limit/2’. This is clearly not rigorous.
  2. Multiple imputation, but (i) I don’t know how to deal with the fact that values above the detection limit are likely to be generated by the imputation process, which I will know with high probability to be false, and (ii) I’m not sure how well multiple imputation plays with Bayesian approaches, since I can’t think of a good way to aggregate the posterior distributions from fits to the different imputed datasets
  3. Simulate values data from a distribution that makes sense based on prior knowledge and the data, and randomly assign values below the detection limit to the relevant points. This suffers from similar problems to #2, since I would have to simulate many sets of values, model them separately, and then figure out how to integrate the posteriors.

Am I missing better options? Are there useful Bayesian tricks that can help deal with this problem? I’m also open to non-Bayesian options.

The histogram below shows the distribution of values. The plot is on a log scale because that is most natural for this variable. For visual clarity, I have treated values below the detection limit (~25% of the data) as being 1/10 of the detection limit, and added a red line to separate them from the remaining points. Note that the red line is not the precise detection limit; the smallest quantified values to the right of the red line are at the putative limit. The fact that there are very few values exactly at the limit suggests that there may have been some variation in the detection limit between measurements, but I don’t mind if that is ignored for the purposes of this question.
Histogram of


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.