#StackBounty: #r #raster Weighted mean calculation with zonal command

Bounty: 50

I’d like to know for each category, the weighted mean of a variable (as each pixel has a weight). The code below doesn’t allow me to do weighted averages. Any suggestions would be greatly appreciated!

#sample rasters
    r <- raster(ncols=10, nrows=10)
    category <-  raster(ncols=10, nrows=10)
    weight <-  raster(ncols=10, nrows=10)
    r[] <- runif(ncell(r)) * 1:ncell(r)
    category[] <- runif(ncell(category)) * 1:ncell(category)
    weight[] <- 1:ncell(weight)

#mean for each category
    zonalstats <- zonal(r, z=category, fun='mean', digits=0, na.rm=TRUE, count=T)


Get this bounty!!!

#StackBounty: #r #mixed-model #lme4-nlme Including a term as random intercept and random slope in mixed model

Bounty: 50

I am fitting a mixed model of crop yield against rainfall collected over multiple years and locations. Usually in the agriculture domain, there is a strong time trend i.e. increase in yield with time because of changes in technology. I specified my model like this:

 mdl <- lmer(yield ~ rainfall + (1|year) + (1|location), data = dat)

However, if I understand the above specification correctly, it assumes that same time trend exists across all locations which is not correct. So if I want to include a term which allows the time trend to change dependent on location, which of the following specifications is correct:

mdl <- lmer(yield ~ rainfall + (1 + year|year) + (1|location), data = dat)
mdl <- lmer(yield ~ rainfall + (1|year) + (1 + year|location), data = dat)


Get this bounty!!!

#StackBounty: #r R: How to center boxes on top of lines in the legend of a plot?

Bounty: 200

I currently have a plot and I would like to create a legend. In it, I would like to superimpose transparent boxes on top of lines. An example would be the following code:

xdata <- c(1,2,3,4,5,6,7)
y1 <- c(1,4,9,16,25,36,49)
y2 <- c(1, 5, 12, 21, 34, 51, 72)
y3 <- c(1, 6, 14, 28, 47, 73, 106 )
y4 <- c(1, 7, 17, 35, 60, 95, 140 )
plot(xdata, y1, type = 'l', col = "red")
lines(xdata, y2, type = 'l', col = "red", lty = 3)
lines(xdata, y3, type = 'l', col = "blue") 
lines(xdata, y4, type = 'l', col = "blue",  lty = 3)
 legend("topleft", legend = c("Plot 1", "Plot 2", "Plot 3", "Plot 4"), lty = c(1,3,1,3), lwd = rep(1.3 ,4), col = c("blue", "blue", "red", "red") , pch = rep(NA,4), cex = 0.8, y.intersp=1.2, bty = 'n', x.intersp = 0.7)
legend("topleft", legend = c("", "", "", ""), lty = rep(0, 4), col = c(adjustcolor(blues9[3],alpha.f=0.6),adjustcolor(blues9[3],alpha.f=0.6),adjustcolor("red",alpha.f=0.1),adjustcolor("red",alpha.f=0.1)), pch = rep(15,4), bty ='n', cex = 0.8, pt.cex = rep(2, 4), y.intersp = 1.2, x.intersp = 0.7)

which produces:

enter image description here

The boxes are not centered on top of the line symbols in the legend. Is there a way to shift it? Thanks!


Get this bounty!!!

#StackBounty: #r #statistical-significance #inference #standard-error #bradley-terry-model Inference on and quasi variances in Bradley-…

Bounty: 50

Upfront

I am not a statistician but a medical doctor. I have a working knowledge of statistical methods in my field, but this is my first time with pairwise comparisons and due to a lack of formal math training I am having a hard time to get into this new field. This is one of the questions with a pledge for simple worded answers.

Object of research

I have 8 dietary supplements and want to know, if they differ in taste and if so, which is best. So I asked 37 people to do pairwise comparisons of random pairs. I got 224 observations/votes.

Methods

Internet research suggests, that this should be evaluated via Bradley-Terry analysis and I found the BradleyTerry2 package for R and followed their vignette. The result of a simple call to the BTm modelling function is this:

Call:
BTm(outcome = adm$winner == "A", player1 = adm$A, player2 = adm$B)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3883  -0.6068   0.2195   0.7277   2.5585  

Coefficients:
    Estimate Std. Error z value Pr(>|z|)    
..2  -2.2017     0.5024  -4.382 1.17e-05 ***
..3  -1.3414     0.4659  -2.879 0.003991 ** 
..4   1.4892     0.5862   2.540 0.011070 *  
..5   1.1937     0.5293   2.255 0.024112 *  
..6  -1.5988     0.4932  -3.241 0.001189 ** 
..7  -1.7452     0.4896  -3.564 0.000365 ***
..8  -2.5201     0.5231  -4.817 1.46e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 310.53  on 224  degrees of freedom
Residual deviance: 185.07  on 217  degrees of freedom
AIC: 199.07

Number of Fisher Scoring iterations: 5

So the function chose “1” to be the reference category and fixed the ability of that to 0 and positioned all other supplements with estimates and standard errors of estimates around that. As all of these estimates are significantly different from 0, I can conclude, that these supplements vary significantly in taste.

Now the next step (chapter 4, page 11, in https://cran.r-project.org/web/packages/BradleyTerry2/vignettes/BradleyTerry.pdf ), according to the package’s vignette, is to calculate “quasi variances”, which leads to the following table:

   estimate        SE   quasiSE   quasiVar
1  0.000000 0.0000000 0.3657419 0.13376715
2 -2.201721 0.5023954 0.3183952 0.10137549
3 -1.341350 0.4659285 0.2914081 0.08491867
4  1.489221 0.5861939 0.5199500 0.27034804
5  1.193664 0.5292600 0.4656436 0.21682395
6 -1.598792 0.4932263 0.3113142 0.09691653
7 -1.745216 0.4896299 0.3092022 0.09560602
8 -2.520092 0.5231470 0.3418877 0.11688718

I am having a hard time trying to understand, what is happening here and why we do this. Apparently, “1” now has a standard error, too, and all other standard errors have become a little smaller. This allows me to draw a plot of the estimates with 1.96 times their quasi standard error:

enter image description here

In the original model, the estimate of “4” was significantly different from 0, but now the error bars of “1” and “4” do overlap a lot. This appears to be a contradiction: Are the estimates of “1” and “4” significantly different, or are they not?

Specific questions

  1. How do I properly do an omnibus test on the Bradley-Terry model to proove, that it performs significantly better then a null model, i. e. that my supplements differ in taste?
  2. Is the original model or the “quasi-variances-thing” the best way, to present my model and how can I do inference on whether the differences between two arbitrary supplements are significant?
  3. Please give or point to an explanation of why (not how) we compute quasi variances in this context in as simple terms as possible.


Get this bounty!!!

#StackBounty: #r #hypothesis-testing #repeated-measures #nested-data #plm comparing groups in repeated measures FE models, with a neste…

Bounty: 50

I have estimated some repeated measures Fixed Effects models, with a nested error component, using . I am now interested to

  1. test if the full models are significantly different, i.e. $$H_o: beta_{Female} = beta_{Male}$$ where $beta_{Female}$ is the full model for Females and $beta_{Male}$ is the full model for Males and
  2. subsequently test selected regression coefficients between two groups, i.e. $$H_o: beta_{Female == year1.5} = beta_{Male == year1.5}$$ where $beta_{Female == year1.5}$ is the regression coefficient for females at year1.5, and $beta_{Male == year1.5}$ is the regression coefficient for males at year1.5.

I will illustrate the situation using the below working example,

First, some packages needed,

# install.packages(c("plm","texreg","tidyverse","lmtest"), dependencies = TRUE)
library(plm); library(lmtest); require(tidyverse)

Second, some data preparation,

data(egsingle, package = "mlmRev")
dta <-  egsingle %>% mutate(Female = recode(female,.default = 0L,`Female` = 1L))

Third, I estimate a set of models for each gender in data

MoSpc <- as.formula(math ~ Female + size + year)
dfMo = dta %>% group_by(female) %>%
    do(fitMo = plm(update(MoSpc, . ~ . -Female), 
       data = ., index = c("childid", "year", "schoolid"), model="within") )

Forth, lets look at the two estimated models,

texreg::screenreg(dfMo[[2]], custom.model.names = paste0('FE: ', dfMo[[1]]))
#> ===================================
#>            FE: Female   FE: Male   
#> -----------------------------------
#> year-1.5      0.79 ***     0.88 ***
#>              (0.07)       (0.10)   
#> year-0.5      1.80 ***     1.88 ***
#>              (0.07)       (0.10)   
#> year0.5       2.51 ***     2.56 ***
#>              (0.08)       (0.10)   
#> year1.5       3.04 ***     3.17 ***
#>              (0.08)       (0.10)   
#> year2.5       3.84 ***     3.98 ***
#>              (0.08)       (0.10)   
#> -----------------------------------
#> R^2           0.77         0.79    
#> Adj. R^2      0.70         0.72    
#> Num. obs.  3545         3685       
#> ===================================
#> *** p < 0.001, ** p < 0.01, * p < 0.05    #> 

Now, I want to test if these two (linear OLS) models are significantly different, cf. point1 above. I looked around SO and the internet and some suggest that I need to use plm::pFtest(), also suggested here, which I have tried, but I’m not convinced and wonder if someone here has experience and could possibly help me.

I tried,

plm::pFtest(dfMo[[1,2]], dfMo[[2,2]])
# >
# > F test for individual effects
# >
# >data:  update(MoSpc, . ~ . - Female)
# >F = -0.30494, df1 = 113, df2 = 2693, p-value = 1
# >alternative hypothesis: significant effects

Second, I am interested to compare regression coefficients between two groups. Say, is the estimate for year1.5 of 3.04 significantly different from 3.17? Cf. point 2 above.

Please ask if any of the above is not clear and I will be happy to elaborate. Any help will be greatly appreciated!

I realize this question is a bit programming like, but I initially posted it in SO. However, DWin was kind enough to point out that the question belonged in CrossValidated and migrated it here.



Get this bounty!!!

#StackBounty: #r #inference Statistical analyses for choices behaviour, free and restricted context evolution

Bounty: 50

I’m wondering obtaining help about a complex system.
Data is my datastet, for one travel Type I’m testing seating choice behaviour. I record walked distance 'Effort' from the entrance of subject toward each Type_s which is type of seat.

There is in fact a possible crowd effect that could impact these distances, the more the subject perceive agglutination of person (taken places) it is possible that he will walk more distance to be alone and have more choices. Naturally, maybe there is slight effect of exhaustion of seats according to nearest distance. We have also two Floor that contains almost the same architecture but implies additional Effort.

P_exhaustion is a measure of exhaustment of all the places, when 100 % is attained its imply no more places.

ind is indicator of chronological action so if it 1 for example it says that that was the first person choice its distance/places. For every distance we have the same set of places"f (door) ,m (middle) ,c (corridor)", excepting for "s(solo)" place that was installed in the extremity bus.

Questions :

- Is one subject will provide more `Effort` according 'Type_s' ? 
- Does the seats exhaustion happens near-to near 
- Does `P_Exhaustion` implies more effort ? or there is crossing effect of a well distributed choices (to be alone) that neutralize the imposed distance effect of the last places.
- Could we in some way, test how it behaves when there are free choices, restricted and no choices, 

My only trial about explaining Effort variation, i’m not very sure that it is the best test to use

aov.ex1 = aov(Effort~ Type_s +  P_exhaustion + Floor,data=Data)  #do the analysis of variance
summary(aov.ex1) 

I’m looking for help to formalize these effects and to test them statistically.

Data=structure(list(Type = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "29v1", 
"29v2", "80v1", "80v2"), class = "factor"), ind = 1:82, Effort = c(3, 
10, 1, 5, 4, 2, 1, 8, 2, 13, 1, 6, 3, 5, 2, 1, 3, 8, 3, 6, 1, 
3, 7, 1, 2, 2, 2, 1, 2, 5, 4, 1, 6, 3, 6, 8, 2, 3, 4, 7, 3, 2, 
6, 2, 3, 7, 1, 5, 4, 1, 4, 3, 2, 3, 5, 5, 2, 1, 1, 5, 8, 7, 2, 
2, 4, 3, 4, 4, 2, 2, 10, 7, 5, 3, 3, 5, 7, 5, 3, 4, 5, 4), Type_s = structure(c(4L, 
6L, 4L, 4L, 4L, 4L, 6L, 3L, 6L, 4L, 3L, 4L, 4L, 4L, 6L, 6L, 4L, 
4L, 3L, 4L, 6L, 4L, 4L, 6L, 4L, 3L, 4L, 4L, 6L, 4L, 4L, 3L, 4L, 
3L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 3L, 3L, 4L, 3L, 3L, 
3L, 3L, 3L, 6L, 4L, 3L, 3L, 3L, 6L, 4L, 3L, 4L, 4L, 3L, 3L, 4L, 
3L, 3L, 3L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L, 3L, 3L, 3L, 
4L), .Label = c("", "*", "c", "f", "m", "s"), class = "factor"), 
    P_exhaustion = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 
    5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 
    7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 
    9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
    10L), .Label = c("(0,10]", "(10,20]", "(20,30]", "(30,40]", 
    "(40,50]", "(50,60]", "(60,70]", "(70,80]", "(80,90]", "(90,100]"
    ), class = "factor"), Floor = structure(c(2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 2L, 2L, 
    3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 
    3L, 3L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 
    3L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 3L, 2L, 3L, 
    3L, 3L, 3L, 2L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L), .Label = c("", "down", "up"), class = "factor")), .Names = c("Type", 
"ind", "Effort", "Type_s", "P_exhaustion", "Floor"), class = "data.frame", row.names = c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 
42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 
55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 
68L, 69L, 70L, 71L, 73L, 75L, 76L, 79L, 80L, 81L, 82L, 84L, 85L, 
86L, 91L))


Get this bounty!!!

#StackBounty: #r #regression #non-linear-regression #piecewise How to perform piece wise/spline regression for longitudinal temperature…

Bounty: 50

Here I have temperature time series panel data and I intend to run piecewise regression or cubic spline regression for it. So first I quickly looked into piecewise regression concepts and its basic implementation in R in SO, got an initial idea how to proceed with my workflow. In my first attempt, I tried to run spline regression by using splines::ns function in splines package, but I didn’t get right bar plot. For me, using baseline regression, or piecewise regression or spline regression could work.

Based on the post about piecewise regression in SO, I understand that I need to find out appropriate breakpoint or knots before running piecewise regression, which could render possible trend line based on data. I adopted the solution of piecewise but still not efficient and can’t get my desired plot.

Here is the general picture of my panel data specification: at the first row shown below are my dependent variables which presented in natural log terms and independent variables: average temperature, total precipitation and 11 temperature bins and each bin-width (AKA, bin’s window) is 3-degree Celsius. (<-6, -6~-3,-3~0,…>21).

reproducible example:

Here is the reproducible data that simulated with actual temperature time series panel data:

dat= data.frame(index = rep(c('dex111', 'dex112', 'dex113','dex114','dex115'), each = 30), year =1980:2009,
                region= rep(c('Berlin','Stuttgart','Böblingen','Wartburgkreis','Eisenach'), each=30),
                ln_gdp_percapita=rep(sample.int(40, 30), 5),ln_gva_agr_perworker=rep(sample.int(45, 30), 5),
                temperature=rep(sample.int(50, 30), 5), precipitation=rep(sample.int(60, 30), 5),
                bin_1=rep(sample.int(32, 30), 5), bin_2=rep(sample.int(34, 30), 5), bin_3=rep(sample.int(36, 30), 5),
                bin_4=rep(sample.int(38, 30), 5), bin_5=rep(sample.int(40, 30), 5), bin_6=rep(sample.int(42, 30), 5),
                bin_7=rep(sample.int(44, 30), 5),bin_8=rep(sample.int(46, 30), 5), bin_9=rep(sample.int(48, 30), 5),
                bin_10=rep(sample.int(50, 30), 5), bin_11=rep(sample.int(52, 30), 5))

update:

notes that each bin has equally divided temperature interval except its extreme temperature value, so each bin gives the number of days that fall in respective temperature interval.

Basically, I want to fit spline regression on my data (let’s say, choose one dependent variable such as ln_gdp_percapita, and multiple independent variables such as bin_1,bin_2,…, bin_11).

my attempt to run spline regression:

Here is what I did so far to run spline regression on temperature time series data:

fit_sp <- smooth.spline(dat$bin_6, ln_gdp_percapita, nknots = 4)
pred <- stats:::predict.smooth.spline(fit_sp, dat$bin6)$ln_gdp_percapita
summary(pred)

but this is not what I want to do. So I also tried as follow:

sp2=lm(dat$ln_gva_agr_perworker ~ splines::ns(x = c(dat$bin_6),df = 2 ,knots =c(3)), data = dat)
df_=mutate(dat, smooth=fitted(sp2))
stats::predict(sp2)
ggplot(df_, aes(dat_$bin_6, df_$ln_gva_agr_perworker)) + geom_line(aes(df_$bin_6, df_$smooth))

but it didn’t render the plot that I want to get. My understanding about piecewise regression is to observe its breaking point and run regression it’s left or right observation that next to breaking point, which gives us a gradual linear trend. But now I didn’t get such result. Perhaps, I try to focus more on searching efficient workflow.

In general, I want to see how agriculture or industry sectors respond to daily temperature. So running a simple regression on temperature bin panel data is not sufficient to reach a conclusion. I believe restricted spline regression/smoothing spline or baseline regression might produce better estimation.

desired regression output metrics: (this one is inspired by related paper’s appendix.)

Here is the general picture of my desired regression output metric:

        b_hi    b   b_low   st error    t_stat  p-value
bin1    (-23.87:-18.44) -0.0129 -0.0306 -0.0483 0.0090257   -3.39   0.001
bin2    (-18.44:-13.02) -0.0050 -0.0096 -0.0141 0.0023336   -4.1    0
bin3    (-13.02:-7.59)  -0.0040 -0.0057 -0.0075 0.0008904   -6.44   0
bin4    (-7.59:-2.17)   0.0030  0.0021  0.0011  0.000492    4.23    0
bin5    (-2.17:3.26)    -0.0007 -0.0012 -0.0018 0.0002781   -4.48   0
bin6    (3.26:8.69) 0.0000  0.0000  0.0000          
bin7    (8.69:14.11)    0.0008  0.0001  -0.0005 0.0003502   0.41    0.681
bin8    (14.11:19.54)   0.0010  0.0000  -0.0010 0.0005107   0.06    0.956
bin9    (19.54:24.96)   0.0028  0.0016  0.0005  0.0005737   2.85    0.004
bin10   (24.96:30.39)   -0.0031 -0.0057 -0.0083 0.0013075   -4.36   0

desired scatter plot:

Here is the desired scatter plot that I want to achieve (this is just simulated scatter plot that inspired by related paper’ figure):

enter image description here

in this plot, black point line is estimated regression (either baseline or restricted spline regression) coefficient, and dot blue line is 95% confidence interval based on clustered standard errors.

How do I accomplish my desired output with minimal code and efficiently? Could someone point me in the right direction? Any more thoughts? Thanks in advance!


Get this bounty!!!

#StackBounty: #r #logistic #logit #effect-size #odds-ratio Comparison of two odds ratios: Take 2

Bounty: 50

I would like to test the difference of two odds ratios given the following R-output.

To do so, I would need to take the difference of the log odds and obtain the standard error (outlined here: Statistical test for difference between two odds ratios?).

One of the predictor variables is continuous and I am not sure how I could compute the values required for SE(logOR).

Could someone please explain whether the output I have is conducive to this method?
Results

f=with(data=imp, glm(Y~X1+X2, family=binomial(link=”logit”)))

s01=summary(pool(f1))

s01

                     est        se         t       df   Pr(>|t|) 

   (Intercept) -1.7805826 0.1857663 -9.585070 391.0135 0.00000000 
   X1           0.2662796 0.1308970  2.034268 390.4602 0.04259997  
   X2           0.6757952 0.3869652  1.746398 395.6098 0.08151794 

cbind(exp(s01[, c(“est”, “lo 95”, “hi 95”)]), pval=s01[, “Pr(>|t|)”])

                             est     lo 95     hi 95       pval
              (Intercept) 0.1685399 0.1169734 0.2428389 0.00000000
              X1          1.3051000 1.0089684 1.6881459 0.04259997
              X2          1.9655955 0.9185398 4.2062035 0.08151794


Get this bounty!!!

#StackBounty: #r #lme4-nlme #random-effects-model The mathematical representation of a nested random effect term

Bounty: 50

Suppose that a dependent level variable $y$ is measured at a unit level (level 1) that is nested within units of type $A$ (level $2$), and that units of type $A$ are nested within levels of type $B$ (level $3$).

Suppose that I fit the following formula:

y ~ "FIXED EFFECTS [my syntax]" + (1 + x | B/A)

where $x$ is some predictor at level $1$.

My understanding is that the mathematical representation of such a formula is the following. Is it correct?


In what follows, $y_{b,a,i}$ is the output of the $i$th data point in unit $a$ of $A$ nested in unit $b$ of $B$. This data point has a corresponding predictor $x_{b,a,i}$.

$y_{b,a,i} = text{“fixed effects”} + u_{b} + u_{b,1,a} + (beta_{b} + beta_{b,1,a})x$

where

$u_{b} sim N(0, sigma_{B})$

$u_{b,1,a} sim N(0, sigma)$

$beta_{b} sim N(0, rho_{B})$

$beta_{b,a} sim N(0, rho)$

That is, $sigma_{B}$ is a standard deviation term that varies across level $3$. On the other hand, given any $b$, a unit in level $3$, and $a$, a unit contained in level $2$, then the standard deviation term for $a$ is $sigma$. That is, $sigma$ is constant for any level $2$ units.


Is this correct (I based this reasoning by inferring from a related presentation on page 136 of Linear Mixed Models: A Practical Guide Using Statistical Software))? If this is correct, then is there any way to make $sigma$ be dependent on which unit of level $A$ the data point belongs to.


Get this bounty!!!

#StackBounty: #r #dplyr dplyr 0.7.5 change in select() behavior

Bounty: 50

select() in dplyr 0.7.5 returns a different result from dplyr 0.7.4 when using a named vector to specify columns.

library(dplyr)                               
df <- data.frame(a = 1:5, b = 6:10, c = 11:15)
print(df)                                     
#>   a  b  c
#> 1 1  6 11
#> 2 2  7 12
#> 3 3  8 13
#> 4 4  9 14
#> 5 5 10 15

# a named vector
cols <- c(x = 'a', y = 'b', z = 'c')          
print(cols)                                   
#>  x   y   z 
#> "a" "b" "c"

# with dplyr 0.7.4
# returns column names with vector values
select(df, cols)                              
#>   a  b  c
#> 1 1  6 11
#> 2 2  7 12
#> 3 3  8 13
#> 4 4  9 14
#> 5 5 10 15

# with dplyr 0.7.5
# returns column names with vector names
select(df, cols)                              
#>   x  y  z
#> 1 1  6 11
#> 2 2  7 12
#> 3 3  8 13
#> 4 4  9 14
#> 5 5 10 15

Is this a bug or a feature?


Get this bounty!!!