0
I have an experimental design where I took the left and right brain hemispheres from mice across several time points (hence its cross-sectional) following treatment at time 0 to the left brain hemisphere only. Therefore the right hemisphere serves as a paired control for the left hemisphere at each of the time points. Here's the samples?data.frame:
library(dplyr)
df <- data.frame(animal_id = c("id1","id1","id2","id2","id3","id3","id4","id4","id5","id5","id6","id6","id7","id7","id8","id8","id9","id9","id10","id10","id11","id11","id12","id12","id13","id13","id14","id14","id15","id15","id16","id16","id17","id17","id18","id18","id19","id19","id20","id20","id21","id21","id22","id22","id23","id23","id24","id24","id25","id25","id26","id26","id27","id27","id28","id28","id29","id29","id30","id30"),
time_point = c(0,0,0,0,2,2,2,2,2,2,3,3,3,3,3,3,6,6,6,6,6,6,9,9,9,9,14,14,14,14,14,14,14,14,26,26,26,26,26,26,26,26,50,50,50,50,50,50,50,50,74,74,170,170,170,170,170,170,170,170),
hemisphere = c("L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R"),
cov1 = c(99,99,98,98,42.2,42.2,47.6,47.6,38.7,38.7,73.5,73.5,40.6,40.6,37.4,37.4,29.9,29.9,35.1,35.1,38.9,38.9,33.5,33.5,37.9,37.9,38,38,37.3,37.3,45.2,45.2,40.4,40.4,40.3,40.3,39.6,39.6,38.3,38.3,38.9,38.9,37.7,37.7,41.1,41.1,42.8,42.8,37.5,37.5,41.1,41.1,40.8,40.8,42.8,42.8,39,39,40.9,40.9),
cov2 = c(28,28,27.3,27.3,28.2,28.2,28.1,28.1,25.6,25.6,30,30,30.1,30.1,30.3,30.3,30.2,30.2,28.3,28.3,31.6,31.6,28.9,28.9,31.5,31.5,26.4,26.4,26.1,26.1,27.5,27.5,26.4,26.4,23.6,23.6,26.5,26.5,25,25,22.1,22.1,27.8,27.8,23.2,23.2,23.2,23.2,21.1,21.1,25.9,25.9,25.7,25.7,25.4,25.4,26.7,26.7,19,19),
stringsAsFactors = F) %>%
dplyr::mutate(sample_id = paste0(animal_id,"_",hemisphere,"_",time_point))
As you can see I have 30 animals where for each I have both hemispheres at each time point, however the number of animals at each time point varies between 1 to 4.
The values that I measured are gene expression levels (which are positive integers).
For example, this simulated matrix has such counts for 10000 genes for each of the 60 samples.
What I'm interested in is how the difference in gene expression levels between the left and right hemispheres changes at each time point relative to the first time point, while controlling for the covariates (cov1?and?cov2). For this reason I convert?time_point?to a?factor, as well as?animal_id?and?hemisphere?(setting hemisphere?R?as baseline):
df$time_point <- factor(df$time_point)
df$animal_id <- factor(df$animal_id)
df$hemisphere <- factor(df$hemisphere, levels = c("R","L"))
What would be the right?lm?or?glm?model for my question and data?
Seems like?animal_id?is nested within?time_point?so perhaps:
gene_expression ~ cov1 + cov2 + hemisphere + (time_point/animal_id)??