loop in a data.table

Hi,

May be this helps:

dat1<- read.table(text="
a??? b????? c????? d??? e
1??? 12??? 15??? 65??? 6
1??? 65??? 85??? 36??? 5
2??? 69??? 84??? 35??? 8
2??? 45??? 78??? 65??? 8
",sep="",header=TRUE)
library(data.table)
?dat2<- data.table(dat1)
?dat2[,head(sapply(.SD,sum)/sapply(.SD,sum)[4],-1),by="a"]
#?? a??????? V1
#1: 1? 7.000000
#2: 1? 9.090909
#3: 1? 9.181818
#4: 2? 7.125000
#5: 2 10.125000
#6: 2? 6.250000

outputdat<-list()
?ColNames<-colnames(dat2)
?x<- ncol(dat2)-1
?ColNames<-colnames(dat2)
?x<- ncol(dat2)-1
?for(z in 2:x)
?{
?outputdat[[z]]<-dat2[,sum(get(ColNames[z]))/sum(e),by="a"]
?}

do.call(rbind,outputdat)
#?? a??????? V1
#1: 1? 7.000000
#2: 2? 7.125000
#3: 1? 9.090909
#4: 2 10.125000
#5: 1? 9.181818
#6: 2? 6.250000
A.K.

----- Original Message -----
From: Camilo Mora <cmora at DAL.CA>
To: r-help at r-project.org
Cc: 
Sent: Wednesday, March 13, 2013 11:27 PM
Subject: [R] loop in a data.table

I would like to clarify my previous email about using data.table.

imagine the following data.frame called "data":

a? ?  b? ? ?  c? ? ? d? ?  e
1? ?  12? ?  15? ?  65? ?  6
1? ?  65? ?  85? ?  36? ?  5
2? ?  69? ?  84? ?  35? ?  8
2? ?  45? ?  78? ?  65? ?  8

I want to aggregate the rows of columns b:d by the rows of column a. the aggregation is sum(col[b:d]/sum(col[e]).
For this I am using a data.table with a loop of the form:

##########################################

ColNames<-colnames(data)?  #gets the names of the columns

x=ncol(data)-1? ? #number of columns to process minus the last column.

data<-data.table(data)? ?  #converts to data.table

for (z in 2:x)? #I start the loop in the second column and finish in column d
{
outputdata<-data[, sum(get(ColNames[z]))/sum(e), by="a"]
}
############################################

this works fine but the function "get" slowdown the aggregation of the rows by about 20 times. I wonder if there is an alternative fucntion to "get" or an alternative way to aggregate all columns at once. I am reading into the function .SD but have not yet figure out how to put more than one operation in the function.

right now I have:
###############
outputdata=data[, lapply(.SD, sum), by="a", .SDcols=2:x]

##############
this later code aggregates all columns at once but only by summing. eventually I need to divide the sum of each column by the sum of column e as well.

ANy help will be greatly appreciate.

Thanks,

Camilo

Camilo Mora, Ph.D.
Department of Geography, University of Hawaii
Currently available in Colombia
Phone:?  Country code: 57
? ? ? ?  Provider code: 313
? ? ? ?  Phone 776 2282
? ? ? ?  From the USA or Canada you have to dial 011 57 313 776 2282
http://www.soc.hawaii.edu/mora/

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

loop in a data.table

Thread (2 messages)