Skip to content
Prev 163815 / 398506 Next

Yet another set of codes to optimize

I have problems converting my dataset from long to wide format. Previous attempts using reshape package and aggregate function were unsuccessful as they took too long. Apparently, my simplified solution also lasted as long. 
 
My complete codes is given below. When sample.size = 10000, the execution takes about 20 seconds. But sample.size = 100000 seems to take eternity. My actual sample.size is 15000000 i.e. 15 million. 
 
 
 
sample.size <- 10000

m <- data.frame(Name=sample(1:100000, sample.size, T), Type=sample(1:1000, sample.size, T), Predictor=sample(LETTERS[1:10], sample.size, T))
res <- function(m) {
    m.12.unique <- unique(m[,1:2])
    m.12.unique <- m.12.unique[order(m.12.unique[,1], m.12.unique[,2]),]
    v1 <- paste(m.12.unique[,1], m.12.unique[,2], sep=".")
    v2 <- c(sort(unique(m[,3])))
    res <- matrix(0, nr=length(v1), nc=length(v2), dimnames=list(v1, v2))
    m.ids <- paste(m[,1], m[,2], sep=".")
    for(i in 1:nrow(m)) {
      x <- m.ids[i]
      y <- m[i,3]
      res[x, y] <- res[x, y] + 1
    }
   res <- data.frame(m.12.unique[,1], m.12.unique[,2], res, row.names=NULL)
   colnames(res) <- c("Name", "Type", v2)
   return(res)
}
 
res(m)
R version 2.8.0 (2008-10-20) 
i386-pc-mingw32 
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base