#StackBounty: #r #mixed-model #anova #repeated-measures Running a simple anova test in R (repeated measures)

Bounty: 50

We are studying 3 different proteins, each under 9 different conditions at 3 different timepoints (Day 1,2,3). For these we have 3 biological replicates. So we have 81 different experiments (3 proteins * 9 conditions * 3 replicates) ‒ and for each experiment we have data at three different timepoint readings on consecutive days. This gives us 243 observations in a balanced design.

We would like to show which of these proteins and conditions are statistically different from each other. We would like a comparision between proteins, and the conditions of each protein compared. For this we were thinking of using a repeated measure anova test (using R).

I replicated a MWE of the dataset and example here:

library(RCurl)
library(dplyr)

raw.data <- getURL("https://gist.githubusercontent.com/jp-um/1849ac4ac61411d0751cdbec4406e0cd/raw/4b014f986085665e75806c38a25f39093b2d19df/anon.csv")
exp.data <- read.csv(text = raw.data, colClasses=c("experiment"="factor", 
                                                   "protein"="factor",
                                                   "condition"="factor",
                                                   "day"="factor",
                                                   "bioreplicate"="factor"))
summary(exp.data, maxsum=10)

aov.model <- aov(density ~ protein*condition*day + Error(experiment/day), data = exp.data)
summary(aov.model)

The output is:

Error: experiment
                  Df   Sum Sq Mean Sq F value Pr(>F)    
protein            2 10729989 5364994  1166.3 <2e-16 ***
condition          8 16430568 2053821   446.5 <2e-16 ***
protein:condition 16 29649758 1853110   402.8 <2e-16 ***
Residuals         54   248404    4600                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: experiment:day
                       Df  Sum Sq Mean Sq F value  Pr(>F)    
day                     2   92976   46488   13.24 7.2e-06 ***
protein:day             4 1776592  444148  126.49 < 2e-16 ***
condition:day          16 3419459  213716   60.87 < 2e-16 ***
protein:condition:day  32 7415908  231747   66.00 < 2e-16 ***
Residuals             108  379221    3511                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I have a few questions, please:

  1. Is repeated measure anova the way to go here (as opposed to mixed models)?
  2. Is this the correct way to specify the formula? Will this give me repeated measures over the timepoints (Day)? Specifically what does Error(experiment/day) mean? My interpretation is that we have a random effect based on the biological replicate (experiment in my case) and repeated readings for the timepoint (Day).
  3. What is the ezANOVA equivalent way to write this?
  4. Is the above anova, equivalent to the linear mixed effect model lme(density ~ protein*condition*day, random = ~1 | experiment/day, data = exp.data) ?
  5. The output tells me that there are differences between them, but I would like to know which combination gives the difference. I know I can use a post-hoc test for this, and I found TukeyHSD does not work on repeated measures. I have found I can use glht for this; but I am unable to interpret its output. (I tried glht(lme.model, linfct=mcp(protein="Tukey", condition="Tukey")) but I am not sure this is correct)

Apologies for the (many, and rather basic I am afraid) questions. I would appreciate your time and help.

Many thanks,


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.