Dear R users,
I have a data frame in the form below, on which I would like to make normality tests on the values in the ExpressionLevel column.
?ID Plant ?Tissue ?Gene ExpressionLevel
1 ?1 p1 ? ? t1 ? ? ?g1 ? 366.53
2 ?2 p1 ? ? t1 ? ? ?g2 ? ? 0.57
3 ?3 p1 ? ? t1 ? ? ?g3 ? ?11.81
4 ?4 p1 ? ? t2 ? ? ?g1 ? 498.43
5 ?5 p1 ? ? t2 ? ? ?g2 ? ? 2.14
6 ?6 p1 ? ? t2 ? ? ?g3 ? ? 7.85
I would like to make the tests on every group according to the content of the Plant, Tissue and Gene columns.
My first problem is how to run a function for all these sub groups.
I first thought of making subsets:
group1 <- subset(df, Plant=="p1" & Tissue=="t1" & Gene=="g1")
group2 <- subset(df, Plant=="p1" & Tissue=="t1" & Gene=="g2")
group3 <- subset(df, Plant=="p1" & Tissue=="t1" & Gene=="g3")
group4 <- subset(df, Plant=="p1" & Tissue=="t2" & Gene=="g1")
group5 <- subset(df, Plant=="p1" & Tissue=="t2" & Gene=="g2")
group6 <- subset(df, Plant=="p1" & Tissue=="t2" & Gene=="g3") etc...
But that would be very time consuming and I would like to be able to use the code for other data frames...
I have also tried to store these in a list, which I am looping through, running the tests, something like this:
alist=list(group1, group2, group3, group4, group5, group6)
for(i in alist)
{
?print(shapiro.test(i$ExpressionLevel))
?print(pearson.test(i$ExpressionLevel))
?print(pearson.test(i$ExpressionLevel, adjust=FALSE))
}
But, there must be an easier and more elegant way of doing this... I found the example below at http://stackoverflow.com/questions/4716152/why-do-r-objects-not-print-in-a-function-or-a-for-loop. I think might be used for the printing of the results, but I do not know how to adjust for my data frame, since the functions are applied on several columns instead of certain rows in one column.
DF <- data.frame(A = rnorm(100), B = rlnorm(100))
obj2 <- lapply(DF, shapiro.test)
tab2 <- lapply(obj, function(x) c(W = unname(x$statistic), p.value = x$p.value))
tab2 <- data.frame(do.call(rbind, tab2))
printCoefmat(tab2, has.Pvalue = TRUE)
Finally, I have found several different functions for testing for normality, but which one(s) should I choose? As far as I can see in the help files they only differ in the minimum number of samples required.
Thanks in advance!
Kind regards,
Joel
? ? ? ?[[alternative HTML version deleted]]