avoiding too many loops - reshaping data
Here is the summary of methods. tapply is the fastest!
library(reshape)
system.time(for(i in 1:1000)cast(melt(mydf, measure.vars = "value"),
city ~ brand,fun.aggregate = sum))
user system elapsed
18.40 0.00 18.44
library(reshape2)
system.time(for(i in 1:1000)dcast(mydf,city ~ brand, sum))
user system elapsed
12.36 0.02 12.37
system.time(for(i in 1:1000)xtabs(value ~ city + brand, mydf))
user system elapsed
2.45 0.00 2.47
system.time(for(i in 1:1000)tapply(mydf$value,mydf[c('city','brand')],sum))
user system elapsed
0.78 0.00 0.79
Dimitri
On Wed, Nov 3, 2010 at 4:32 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
Try this: ?xtabs(value ~ city + brand, mydf) On Wed, Nov 3, 2010 at 6:23 PM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:
Hello!
I have a data frame like this one:
mydf<-data.frame(city=c("a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b"),
?brand=c("x","x","y","y","z","z","z","z","x","x","x","y","y","y","z","z"),
?value=c(1,2,11,12,111,112,113,114,3,4,5,13,14,15,115,116))
(mydf)
What I need to get is a data frame like the one below - cities as
rows, brands as columns, and the sums of the "value" within each
city/brand combination in the body of the data frame:
city x ? y ? ?z
a ? ?3 ? 23 ?336
b ? ?7 ? 42 ?231
I have written a code that involves multiple loops and subindexing -
but it's taking too long.
I am sure there must be a more efficient way of doing it.
Thanks a lot for your hints!
--
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
Dimitri Liakhovitski Ninah Consulting www.ninah.com