I faced a similar problem. Here's what I did tmp <- data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10)) tmp1 <- with(tmp,aggregate(C,list(A=A,B=B),sum)) tmp2 <- expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B))) merge(tmp2,tmp1,all.x=T) At least fewer than 10 extra lines of code. Anyone with a simpler solution? Cheers, Hans
lebouton wrote:
Dear all, I'm wanting to do a series of comparisons among 4 categorical variables: a <- aggregate(y, list(var1, var2, var3, var4), sum) This gets me a very nice 2-dimensional data frame with one column per variable, BUT, as help for aggregate says, <<empty subsets are removed>>. I don't see in help(aggregate) how I can change this. In contrast, a <- tapply(y, list(var1, var2, var3, var4), sum) gives me results for everything including empty subsets, but in an awkward 4-dimensional array that takes me another 10 lines of inefficient code to turn into a 2D data.frame. Is there a way to directly do this calculation INCLUDING results for empty subsets, and still obtain a 2D array, matrix, or data.frame? OR alternatively is there a simple way to mush the 4D result from the tapply into a 2D matrix/data.frame? thanks very much in advance for any help! -jlb -- ************************************ Joseph P. LeBouton Forest Ecology PhD Candidate Department of Forestry Michigan State University East Lansing, Michigan 48824 Office phone: 517-355-7744 email: lebouton at msu.edu <https://stat.ethz.ch/mailman/listinfo/r-help>
********************************* Hans Gardfjell Ecology and Environmental Science Ume?? University 90187 Ume??, Sweden email: hans.gardfjell at emg.umu.se phone: +46 907865267 mobile: +46 705984464