#StackBounty: #r #logistic #prediction #regression-coefficients R: Reconstructing a logistic regression model with splines using publis…

Bounty: 50

What I would like to do

I would like to reconstruct a logistic regression model with splines (Lymph Node Involvement (Cores)) using published coefficients and spline knots. All sources that I posted here are from: https://www.mskcc.org/nomograms/prostate/pre_op/coefficients

Why I would like to do that

My aim is to reconstruct the model "Lymph node involment (Cores)" in R, so that I can apply it on a large number of patients of a clinical study (without having to type all the data in an online calculator for every subject) and predict their probability of Lymph node involvement.

The published information about the model

The model definition is as follows:

enter image description here

The restricted cubic spline terms are as follows:

enter image description here
enter image description here

The intercept and coefficients are as follow:
enter image description here

Question

  1. Is it possible to reconstruct this model with the published data
  2. If yes, how can this be achieved using R

What I have already done

I found following source, however, it is slightly different (I do not want to change anything of the model)
Reconstructing a logistic regression model from literature using published coefficients

I understand that I somehow have to reconstruct the model with something like:

#reconstruct model
copylogit <- ...

And then apply it to my data:

#make test data
newdata <-data.frame(age=as.numeric(80),psa=as.numeric(10),gleason_grade=as.factor(4),clinical_stage=as.character("2A"), no_of_positive_cores=as.numeric(2),no_of_negativ_cores=as.numeric(10))

#apply model to test data
predict(copylogit, newdata = newdata, type = "response")

And the expected result for this example would be: 17%

Update:
Since I am not sure if it is possible to reconstruct the model with the published data I thought about to generate a prediction equation so that I can calculate the probabilites of lymph node involvement in a larger dataset. Therefore, I opened a new question: Calculate spline terms of a logistic regression using published knots and formula


Get this bounty!!!

#StackBounty: #r #logistic #simulation #accuracy Simulation of logistic regression's accuracy

Bounty: 50

I ran a MC simulation of $10^5$ GLM regressions (logistic, logit link) in R. To do so, I assumed:

  • The outcomes ($y$) were repeatedly sampled from a Bernoulli distribution ($N=1000$)
  • The one explanatory variable ($x$) was sampled independently from y from a half-normal distribution ($x≥0$)
  • I then calculated the accuracy of predictions (with cutoff 0.5) as true positive + true negatives over all N

Naïvely, I was perhaps expecting an mean/median accuracy of 0.5, but that wasn’t true. The average accuracy was around 51.5%. Is there a good intuition or theoretical result for this?


Get this bounty!!!

#StackBounty: #r #logistic #simulation #accuracy Simulation of logistic regression's accuracy

Bounty: 50

I ran a MC simulation of $10^5$ GLM regressions (logistic, logit link) in R. To do so, I assumed:

  • The outcomes ($y$) were repeatedly sampled from a Bernoulli distribution ($N=1000$)
  • The one explanatory variable ($x$) was sampled independently from y from a half-normal distribution ($x≥0$)
  • I then calculated the accuracy of predictions (with cutoff 0.5) as true positive + true negatives over all N

Naïvely, I was perhaps expecting an mean/median accuracy of 0.5, but that wasn’t true. The average accuracy was around 51.5%. Is there a good intuition or theoretical result for this?


Get this bounty!!!

#StackBounty: #r #regression #logistic #bayesian log-odds and it's standard error as priors in logistic regression

Bounty: 50

I’m attempting to complete a Bayesian logistic regression with the outcome of whether or not a crash occurred. I have various covariates in my model that are widely used to predict crash occurrence. As such, I’m using informed priors from prior publications that report the odds ratio and it’s 95% C.I for each covariate.

Here’s an example of a prior provided by the model I’m pulling from

crash at night (OR 13.1; 95% CI 5.0 to 31.5) : log-odds (1.12,.20) from $$ frac{log(31.5-5)}{3.92}$$

I wanted to apply the log-odds of these results and their standard error in my updated model as priors. My first thought was to apply the log-odds and it’s a standard error on a normal prior. I’m using logic from the sources 1 & 2 listed at the end of the post.

My question, if my assumptions about applying these log-odds and SE’s on a normal prior are correct, can I simply transform the SE of the log odds to variance and implement?

a normal prior:

βk = (μβk2βk)

requires a variance rather than an SE. According to citation 3 log-odds SE and be transformed into log-odds VAR:

$$SE[log(OR)] = sqrt{VAR[log(OR)]} => SE^2 = VAR[log(OR)]$$

therefore, if I square the standard error x then I should be able to apply this as my final prior:

βk = (1.12,.04)

Is this assumption correct or am I way off? Is there a better way of implementing log-odd priors and their SE’s into a logistic regression model?

Thanks!

  1. AdamO (https://stats.stackexchange.com/users/8013/adamo), Prior for Bayesian multiple logistic regression, URL (version: 2016-03-16): https://stats.stackexchange.com/q/202046

"Basically, you have the flexibility to parametrize estimation however
you see fit, but using a model which is linear on the log odds scale
makes sense for many reasons. Furthermore, using a normal prior for
log odds ratios should give you very approximately normal posteriors."

  1. Sander Greenland, Bayesian perspectives for epidemiological research: I. Foundations and basic methods, International Journal of Epidemiology, Volume 35, Issue 3, June 2006, Pages 765–775, https://doi.org/10.1093/ije/dyi312

"To start, suppose we model these a priori ideas by placing 2:1 odds
on a relative risk (RR) between ½ and 2, and 95% probability on RR
between ¼ and 4. These bets would follow from a normal prior for the
log relative risk ln (RR) that satisfies…"

  1. StatsStudent (https://stats.stackexchange.com/users/7962/statsstudent), How do I calculate the standard deviation of the log-odds?, URL (version: 2020-04-19): https://stats.stackexchange.com/q/266116


Get this bounty!!!

#StackBounty: #r #regression #logistic #bayesian log-odds and it's standard error as priors in logistic regression

Bounty: 50

I’m attempting to complete a Bayesian logistic regression with the outcome of whether or not a crash occurred. I have various covariates in my model that are widely used to predict crash occurrence. As such, I’m using informed priors from prior publications that report the odds ratio and it’s 95% C.I for each covariate.

Here’s an example of a prior provided by the model I’m pulling from

crash at night (OR 13.1; 95% CI 5.0 to 31.5) : log-odds (1.12,.20) from $$ frac{log(31.5-5)}{3.92}$$

I wanted to apply the log-odds of these results and their standard error in my updated model as priors. My first thought was to apply the log-odds and it’s a standard error on a normal prior. I’m using logic from the sources 1 & 2 listed at the end of the post.

My question, if my assumptions about applying these log-odds and SE’s on a normal prior are correct, can I simply transform the SE of the log odds to variance and implement?

a normal prior:

βk = (μβk2βk)

requires a variance rather than an SE. According to citation 3 log-odds SE and be transformed into log-odds VAR:

$$SE[log(OR)] = sqrt{VAR[log(OR)]} => SE^2 = VAR[log(OR)]$$

therefore, if I square the standard error x then I should be able to apply this as my final prior:

βk = (1.12,.04)

Is this assumption correct or am I way off? Is there a better way of implementing log-odd priors and their SE’s into a logistic regression model?

Thanks!

  1. AdamO (https://stats.stackexchange.com/users/8013/adamo), Prior for Bayesian multiple logistic regression, URL (version: 2016-03-16): https://stats.stackexchange.com/q/202046

"Basically, you have the flexibility to parametrize estimation however
you see fit, but using a model which is linear on the log odds scale
makes sense for many reasons. Furthermore, using a normal prior for
log odds ratios should give you very approximately normal posteriors."

  1. Sander Greenland, Bayesian perspectives for epidemiological research: I. Foundations and basic methods, International Journal of Epidemiology, Volume 35, Issue 3, June 2006, Pages 765–775, https://doi.org/10.1093/ije/dyi312

"To start, suppose we model these a priori ideas by placing 2:1 odds
on a relative risk (RR) between ½ and 2, and 95% probability on RR
between ¼ and 4. These bets would follow from a normal prior for the
log relative risk ln (RR) that satisfies…"

  1. StatsStudent (https://stats.stackexchange.com/users/7962/statsstudent), How do I calculate the standard deviation of the log-odds?, URL (version: 2020-04-19): https://stats.stackexchange.com/q/266116


Get this bounty!!!

#StackBounty: #r #logistic #simulation #accuracy Simulation of logistic regression's accuracy

Bounty: 50

I ran a MC simulation of $10^5$ GLM regressions (logistic, logit link) in R. To do so, I assumed:

  • The outcomes ($y$) were repeatedly sampled from a Bernoulli distribution ($N=1000$)
  • The one explanatory variable ($x$) was sampled independently from y from a half-normal distribution ($x≥0$)
  • I then calculated the accuracy of predictions (with cutoff 0.5) as true positive + true negatives over all N

Naïvely, I was perhaps expecting an mean/median accuracy of 0.5, but that wasn’t true. The average accuracy was around 51.5%. Is there a good intuition or theoretical result for this?


Get this bounty!!!

#StackBounty: #r #logistic #simulation #accuracy Simulation of logistic regression's accuracy

Bounty: 50

I ran a MC simulation of $10^5$ GLM regressions (logistic, logit link) in R. To do so, I assumed:

  • The outcomes ($y$) were repeatedly sampled from a Bernoulli distribution ($N=1000$)
  • The one explanatory variable ($x$) was sampled independently from y from a half-normal distribution ($x≥0$)
  • I then calculated the accuracy of predictions (with cutoff 0.5) as true positive + true negatives over all N

Naïvely, I was perhaps expecting an mean/median accuracy of 0.5, but that wasn’t true. The average accuracy was around 51.5%. Is there a good intuition or theoretical result for this?


Get this bounty!!!

#StackBounty: #r #logistic #simulation #accuracy Simulation of logistic regression's accuracy

Bounty: 50

I ran a MC simulation of $10^5$ GLM regressions (logistic, logit link) in R. To do so, I assumed:

  • The outcomes ($y$) were repeatedly sampled from a Bernoulli distribution ($N=1000$)
  • The one explanatory variable ($x$) was sampled independently from y from a half-normal distribution ($x≥0$)
  • I then calculated the accuracy of predictions (with cutoff 0.5) as true positive + true negatives over all N

Naïvely, I was perhaps expecting an mean/median accuracy of 0.5, but that wasn’t true. The average accuracy was around 51.5%. Is there a good intuition or theoretical result for this?


Get this bounty!!!

#StackBounty: #r #logistic #simulation #accuracy Simulation of logistic regression's accuracy

Bounty: 50

I ran a MC simulation of $10^5$ GLM regressions (logistic, logit link) in R. To do so, I assumed:

  • The outcomes ($y$) were repeatedly sampled from a Bernoulli distribution ($N=1000$)
  • The one explanatory variable ($x$) was sampled independently from y from a half-normal distribution ($x≥0$)
  • I then calculated the accuracy of predictions (with cutoff 0.5) as true positive + true negatives over all N

Naïvely, I was perhaps expecting an mean/median accuracy of 0.5, but that wasn’t true. The average accuracy was around 51.5%. Is there a good intuition or theoretical result for this?


Get this bounty!!!

#StackBounty: #r #logistic #simulation #accuracy Simulation of logistic regression's accuracy

Bounty: 50

I ran a MC simulation of $10^5$ GLM regressions (logistic, logit link) in R. To do so, I assumed:

  • The outcomes ($y$) were repeatedly sampled from a Bernoulli distribution ($N=1000$)
  • The one explanatory variable ($x$) was sampled independently from y from a half-normal distribution ($x≥0$)
  • I then calculated the accuracy of predictions (with cutoff 0.5) as true positive + true negatives over all N

Naïvely, I was perhaps expecting an mean/median accuracy of 0.5, but that wasn’t true. The average accuracy was around 51.5%. Is there a good intuition or theoretical result for this?


Get this bounty!!!