#StackBounty: #r #mathematical-statistics #logarithm How to recover the original standard deviation from log and standardized data in t…

Bounty: 50

In order to preserve the anonymity of a client, we receive data in three forms: in one, the outcome variable is mean-centered, in another the outcome variable is first logged and then mean-centered, and in another the outcome variable is first logged and then standardized (mean-centered and scaled by standard deviation). We do not have access to the “raw” values for the outcome variable.

We run regression specifications of the form,

lm(outcome_log_s ~ x1_s + x1_x^2)

In which the outcome variable is logged and standardized. We want to be able to convey what, for example, a coefficient of “.1” on one of our covariates means is in terms of the original y-scale. How would we go about doing this given standard deviation information from mean centered and log mean centered versions of the same variable:

Outcome variable centered:

> sd(data$outcome_m)
[1] 1624277398

> mean(data$outcome_m)
[1] 1992805

Outcome variable logged, and then centered:

> mean(data$outcome_log_m)
[1] 0.005048307

> sd(data$outcome_log_m)
[1] 0.8339012 

Clearly the mean above should be 0 (as it’s centered). Is there any way to say how much a .1 standard deviation increase in the log scale translates to in terms of the unlogged scale? As of current, we interpret this to mean that a 1 unit change in X is associated with a .1 standard deviation change in the log outcome, which is difficult to relate to.

My hunch is that this isn’t possible. If this is the answer — please just leave a comment, rather than responding so that you don’t automatically claim the bounty.

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.