I have some longitudinal data. I’ve done longitudinal analysis before but I have never changed the time metric so I wanted to run the process of that by you.
Edits for clarity:
I have repeated measures data collected over about 2 months but the study has to do with COVID – thus, time (and time passing) is an important component. People beginning the study, for example, on May 14th may be quite different than people coming in on June 1st in terms of our variables. I want to restructure the analysis to examine the effects of time. So I want to go from a scenario where I have a relatively time balanced (time 1, time 2, time 3) agnostic to the actual intake time, and restructure the analysis to take into account the specific dates on which each of the individuals 5 time points were collected – an individually varying times of observation scenario. I propose restructuring the data by indexing the analysis by recoding for each participant their 5 timepoints into ‘days since the beginning of the study’ and to include that as my time metric. I plan on using a linear mixed-effects model and using this new time metric as my ‘time’ covariate in the model.
I go into a few more details of the specific way I want to go about restructuring this below. But TLDR: I want to know a) whether this is defensible and b) whether my method of doing so makes sense below.
5 data collections, spaced equally every 7 days. So t1= intake, t2= day 7, t3 = day 14, t4 = day 21, t5 = day 28.
Sample size ~1500, of course some missing data due to attrition as time goes on.
Participants were allowed to begin the study over the course of approximately a month – and there is a fairly good distribution of intakes across that month where the survey was open.
Instead of analyzing change just across measurement occasion, where the X-axis is t1, t2, t3, t4, t5, I would like to rescale the time-metric to capture actual day within this whole time period that data was collected and to analyze change across time that way as opposed to just being agnostic to the actual date. Turning the X-axis into Day 1, Day 2…, Day 60". This is because I have reason to believe that change on my outcome variable will be a function of time passing.
But as you might imagine, when conceptualized this way (as days) not every day will be common to all participants (i.e., some started on day 3, and some on day 30, and everything in between). So more like a time-unstructured data set – thus I will examine change over time using growth curve using a mixed- effects model.
Here is how I intend to go about doing this time metric change:
Step 1: create variables that show y scores across all ~60 possible days.
Step 2: recode existing 5 measurement occasion data from each participant into data organized by ‘day’ rather than (t1, t2, t3 ,t4, t5) based on date of intake. E.g., someone who began the study on day 1 has their first timepoint now labelled as ‘day 1 Y’, whereas someone who began the study on day 15 has their first timepoint labelled as ‘day 15 y’ in the data set (and their subsequent timepoints 7 days later i.e., ‘day 21’).
Step 3: restructure data to person period format (using participant IDs).
Step 4: run growth curve (with time now representing day and ranges from 1-60), with intercept and time as random effects using mixed effects model.
TLDR: I want to switch to an ‘individually varying time metric’ (Grim et al., 2017). I’ve recoded my data to change the time-metric from measurement occasion to ‘day’ to capture change over time. Is what I have done appropriate/correct?
OR would it just make more sense to include date (operationalized as day1, day2…etc.) as a covariate using the original metric?
Any help would be very much appreciated!
Below is a visual example of what I did using made up some random numbers:
Then pairwise restructure.
Get this bounty!!!