Bounty: 50
I am looking to create a function that simulates data arising from a mediation process, where a predictor (X) has an indirect effect on the outcome (Y) through the mediator (M).
I consulted the answers to the following questions:
I would like the function to simulate:
 the mediator and outcome if the user inputs the predictor,

the predictor and outcome if the user inputs the mediator, or

the predictor and mediator if the user inputs the outcome
I would like the user to be able to specify various conditions for simulating the data arising from mediation, including a
, b
, c
, and the proportion of the effect mediated. The proportion of the effect mediated (Pm) is the ratio of the indirect effect (ab
) to the total effect (c
) (Wen & Fan, 2015). I would like the function to simulate the data that would yield a mediation model with the conditions specified by the user.
For instance, I would like the function to estimate:
 the total effect (c, i.e.,
cTotal
) if the user inputsa
,b
, andproportionMediated
(Pm) 
proportionMediated
if the user inputsa
,b
, andcTotal

a
andb
(assuming they are equal) if the user inputscTotal
andproportionMediated

a
if the user inputsb
,cTotal
, andproportionMediated

b
if the user inputsa
,cTotal
, andproportionMediated
I used the answer to the first link (above) in writing the beginnings of a function:
simulateIndirectEffect < function(x, m, y, a, b, cTotal, proportionMediated, seed){
if(missing(seed)){
seed < round(runif(1, 0, 1000)*100)
}
if(missing(cTotal) == TRUE){
cTotal < (a * b) / proportionMediated
} else if(missing(proportionMediated) == TRUE){
proportionMediated < (a * b) / cTotal
} else if(missing(a) == TRUE & missing(b) == TRUE){
a < sqrt(proportionMediated * cTotal)
b < sqrt(proportionMediated * cTotal)
} else if(missing(a) == TRUE){
a < (proportionMediated * cTotal) / b
} else if(missing(b) == TRUE){
b < (proportionMediated * cTotal) / a
}
ab < a * b
cPrime < cTotal  ab
if(missing(x) == FALSE){
sampleSize < length(x)
set.seed(seed + 1)
m < a*x + sqrt(1a^2) * rnorm(sampleSize) #what should I change error term to?
error < 1  (cPrime^2 + b^2 + 2*a*cPrime*b)
set.seed(seed + 2)
y < cPrime*x + b*m + error*rnorm(sampleSize) #what should I change error term to?
} else if(missing(m) == FALSE){
sampleSize < length(m)
set.seed(seed + 1)
#x < #Not sure what to put here
set.seed(seed + 2)
#y < #Not sure what to put here
} else if(missing(y) == FALSE){
sampleSize < length(y)
set.seed(seed + 1)
#x < #Not sure what to put here
set.seed(seed + 2)
#m < #Not sure what to put here
}
simulatedData < as.data.frame(cbind(x, m, y))
return(simulatedData)
}
I have three questions:
 How can we simulate
m
andy
givenx
(and the conditions specified)
in the above function?  How can we simulate
x
andy
givenm
(and the conditions specified) in the above function?  How can we simulate
x
andm
giveny
(and the conditions specified) in the above function?
Note that the function above does not appear to simulate the mediation data per the conditions specified. For instance, when I simulate data based on a total effect of .6 and a proportion of the effect mediated of .4, my correlations are way too high. I want my correlation between x and y to be .6 (i.e., the total effect), but it is .99 in the simulated data (see below). I suspect that using rnorm()
to generate a random variable with a mean of 0 and SD of 1 is too small to add to the error term, but am not sure what to use instead.
> predictor < rnorm(1000, mean = 50, sd = 10)
> myData < simulateIndirectEffect(x = predictor, cTotal = .6, proportionMediated = .4, seed = 12345)
> round(cor(myData), 2)
x m y
x 1.00 0.98 0.99
m 0.98 1.00 0.99
y 0.99 0.99 1.00
References:
Wen, Z., & Fan, X. (2015). Monotonicity of effect sizes: Questioning kappasquared as mediation effect size measure. Psychological Methods, 20, 193203. doi: 10.1037/met0000029