The code bekow works so this is why I didn't include the data to
reproduce it. The loops about 500
times and each time, a zoo object with 1400 rows and 4 columns gets
created. ( the rows represent minutes so each file is one day
worth of data). Inside the loop, I keep rbinding the newly created zoo
object to the current zoo object so that it gets bigger and
bigger over time.
Eventually, the new zoo object, fullaggfxdata, containing all the days
of data is created.
I was just wondering if there is a more efficient way of doing this. I
do know the number of times the loop will be done at the beginning so
maybe creating the a matrix or data frame at the beginning and putting
the daily ones in something like that would
Make it be faster. But, the proboem with this is I eventually do need a
zoo object. I ask this question because at around the 250
mark of the loop, things start to slow down significiantly and I think I
remember reading somewhere that doing an rbind of something to itself is
not a good idea. Thanks.
#=======================================================================
===============================================
start<-1
for (filecounter in (1:length(datafilenames))) {
print(paste("File Counter = ", filecounter))
datafile= paste(datadir,"/",datafilenames[filecounter],sep="")
aggfxdata<-clnaggcompcurrencyfile(fxfile=datafile,aggminutes=aggminutes,
fillholes=1)
logbidask<-log(aggfxdata[,"bidask"])
aggfxdata<-cbind(aggfxdata,logbidask)
if ( start == 1 ) {
fullaggfxdata<-aggfxdata
start<-0
} else {
fullaggfxdata<-rbind(fullaggfxdata,aggfxdata)
}
}
#=======================================================================
==================================
--------------------------------------------------------
This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
any way to make the code more efficient ?
6 messages · Leeds, Mark (IED), Ravi Varadhan, Charles C. Berry +2 more
Using "rbind" almost always slows things down significantly. You should define the objects "aggfxdata" and "fullaggfxdata" before the loop and then assign appropriate values to the corresponding rows and/or columns. Ravi. ---------------------------------------------------------------------------- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvaradhan at jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ---------------------------------------------------------------------------- -------- -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Leeds, Mark (IED) Sent: Friday, December 08, 2006 4:17 PM To: r-help at stat.math.ethz.ch Subject: [R] any way to make the code more efficient ? The code bekow works so this is why I didn't include the data to reproduce it. The loops about 500 times and each time, a zoo object with 1400 rows and 4 columns gets created. ( the rows represent minutes so each file is one day worth of data). Inside the loop, I keep rbinding the newly created zoo object to the current zoo object so that it gets bigger and bigger over time. Eventually, the new zoo object, fullaggfxdata, containing all the days of data is created. I was just wondering if there is a more efficient way of doing this. I do know the number of times the loop will be done at the beginning so maybe creating the a matrix or data frame at the beginning and putting the daily ones in something like that would Make it be faster. But, the proboem with this is I eventually do need a zoo object. I ask this question because at around the 250 mark of the loop, things start to slow down significiantly and I think I remember reading somewhere that doing an rbind of something to itself is not a good idea. Thanks. #======================================================================= =============================================== start<-1 for (filecounter in (1:length(datafilenames))) { print(paste("File Counter = ", filecounter)) datafile= paste(datadir,"/",datafilenames[filecounter],sep="") aggfxdata<-clnaggcompcurrencyfile(fxfile=datafile,aggminutes=aggminutes, fillholes=1) logbidask<-log(aggfxdata[,"bidask"]) aggfxdata<-cbind(aggfxdata,logbidask) if ( start == 1 ) { fullaggfxdata<-aggfxdata start<-0 } else { fullaggfxdata<-rbind(fullaggfxdata,aggfxdata) } } #======================================================================= ================================== -------------------------------------------------------- This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
ravi : I appreciate your help but could you be a little more specific about what you mean ? I can just stack aggfxdata below the current full one ( the rbind works out the ordrering by date because it's a zoo object ) so it's not a question of where to put the new one. It's a question of how to avoid rbind ? I apologize because I don't think I understand what you are saying. Or maybe it's not possible to avoid rbind ? Thanks. -----Original Message----- From: Ravi Varadhan [mailto:rvaradhan at jhmi.edu] Sent: Friday, December 08, 2006 5:21 PM To: Leeds, Mark (IED); r-help at stat.math.ethz.ch Subject: RE: [R] any way to make the code more efficient ? Using "rbind" almost always slows things down significantly. You should define the objects "aggfxdata" and "fullaggfxdata" before the loop and then assign appropriate values to the corresponding rows and/or columns. Ravi. ------------------------------------------------------------------------ ---- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvaradhan at jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ------------------------------------------------------------------------ ---- -------- -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Leeds, Mark (IED) Sent: Friday, December 08, 2006 4:17 PM To: r-help at stat.math.ethz.ch Subject: [R] any way to make the code more efficient ? The code bekow works so this is why I didn't include the data to reproduce it. The loops about 500 times and each time, a zoo object with 1400 rows and 4 columns gets created. ( the rows represent minutes so each file is one day worth of data). Inside the loop, I keep rbinding the newly created zoo object to the current zoo object so that it gets bigger and bigger over time. Eventually, the new zoo object, fullaggfxdata, containing all the days of data is created. I was just wondering if there is a more efficient way of doing this. I do know the number of times the loop will be done at the beginning so maybe creating the a matrix or data frame at the beginning and putting the daily ones in something like that would Make it be faster. But, the proboem with this is I eventually do need a zoo object. I ask this question because at around the 250 mark of the loop, things start to slow down significiantly and I think I remember reading somewhere that doing an rbind of something to itself is not a good idea. Thanks. #======================================================================= =============================================== start<-1 for (filecounter in (1:length(datafilenames))) { print(paste("File Counter = ", filecounter)) datafile= paste(datadir,"/",datafilenames[filecounter],sep="") aggfxdata<-clnaggcompcurrencyfile(fxfile=datafile,aggminutes=aggminutes, fillholes=1) logbidask<-log(aggfxdata[,"bidask"]) aggfxdata<-cbind(aggfxdata,logbidask) if ( start == 1 ) { fullaggfxdata<-aggfxdata start<-0 } else { fullaggfxdata<-rbind(fullaggfxdata,aggfxdata) } } #======================================================================= ================================== -------------------------------------------------------- This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -------------------------------------------------------- This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
Save your intermediate results as a list of matrices. Then rbind them all at once using do.call. It looks like this will save 23 seconds (see below), if you are running on a PC like mine (AMD 2GHz, WinXP ). But I wonder, if 23 a mere seconds is all you save is this really worth worrying about?? Maybe you are losing time elsewhere. If so, you need to profile this run and/or track memory usage.
amat <- NULL mat.1400.by.4 <- matrix(1:(1400*4),nc=4) system.time(for (i in 1:500) amat <- rbind(amat, mat.1400.by.4 ))
[1] 20.05 1.53 23.24 NA NA
list.of.matrices <- rep( list( mat.1400.by.4 ) , 500 ) system.time( amat2 <- do.call(rbind, list.of.matrices ) )
[1] 0.08 0.00 0.08 NA NA
all.equal(amat,amat2)
[1] TRUE
On Fri, 8 Dec 2006, Leeds, Mark (IED) wrote:
The code bekow works so this is why I didn't include the data to
reproduce it. The loops about 500
times and each time, a zoo object with 1400 rows and 4 columns gets
created. ( the rows represent minutes so each file is one day
worth of data). Inside the loop, I keep rbinding the newly created zoo
object to the current zoo object so that it gets bigger and
bigger over time.
Eventually, the new zoo object, fullaggfxdata, containing all the days
of data is created.
I was just wondering if there is a more efficient way of doing this. I
do know the number of times the loop will be done at the beginning so
maybe creating the a matrix or data frame at the beginning and putting
the daily ones in something like that would
Make it be faster. But, the proboem with this is I eventually do need a
zoo object. I ask this question because at around the 250
mark of the loop, things start to slow down significiantly and I think I
remember reading somewhere that doing an rbind of something to itself is
not a good idea. Thanks.
#=======================================================================
===============================================
start<-1
for (filecounter in (1:length(datafilenames))) {
print(paste("File Counter = ", filecounter))
datafile= paste(datadir,"/",datafilenames[filecounter],sep="")
aggfxdata<-clnaggcompcurrencyfile(fxfile=datafile,aggminutes=aggminutes,
fillholes=1)
logbidask<-log(aggfxdata[,"bidask"])
aggfxdata<-cbind(aggfxdata,logbidask)
if ( start == 1 ) {
fullaggfxdata<-aggfxdata
start<-0
} else {
fullaggfxdata<-rbind(fullaggfxdata,aggfxdata)
}
}
#=======================================================================
==================================
--------------------------------------------------------
This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717
I don't know about efficiency, but at least for readability, you may want to do the following: 1. Indent your code. 2. Create a list of appropriate length, and populate the list with objects you're creating in the loop. 3. After the loop, use do.call(rbind, list). HTH, Andy From: Leeds, Mark (IED)
ravi : I appreciate your help but could you be a little more specific about what you mean ? I can just stack aggfxdata below the current full one ( the rbind works out the ordrering by date because it's a zoo object ) so it's not a question of where to put the new one. It's a question of how to avoid rbind ? I apologize because I don't think I understand what you are saying. Or maybe it's not possible to avoid rbind ? Thanks. -----Original Message----- From: Ravi Varadhan [mailto:rvaradhan at jhmi.edu] Sent: Friday, December 08, 2006 5:21 PM To: Leeds, Mark (IED); r-help at stat.math.ethz.ch Subject: RE: [R] any way to make the code more efficient ? Using "rbind" almost always slows things down significantly. You should define the objects "aggfxdata" and "fullaggfxdata" before the loop and then assign appropriate values to the corresponding rows and/or columns. Ravi. -------------------------------------------------------------- ---------- ---- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvaradhan at jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -------------------------------------------------------------- ---------- ---- -------- -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Leeds, Mark (IED) Sent: Friday, December 08, 2006 4:17 PM To: r-help at stat.math.ethz.ch Subject: [R] any way to make the code more efficient ? The code bekow works so this is why I didn't include the data to reproduce it. The loops about 500 times and each time, a zoo object with 1400 rows and 4 columns gets created. ( the rows represent minutes so each file is one day worth of data). Inside the loop, I keep rbinding the newly created zoo object to the current zoo object so that it gets bigger and bigger over time. Eventually, the new zoo object, fullaggfxdata, containing all the days of data is created. I was just wondering if there is a more efficient way of doing this. I do know the number of times the loop will be done at the beginning so maybe creating the a matrix or data frame at the beginning and putting the daily ones in something like that would Make it be faster. But, the proboem with this is I eventually do need a zoo object. I ask this question because at around the 250 mark of the loop, things start to slow down significiantly and I think I remember reading somewhere that doing an rbind of something to itself is not a good idea. Thanks. #============================================================= ========== =============================================== start<-1 for (filecounter in (1:length(datafilenames))) { print(paste("File Counter = ", filecounter)) datafile= paste(datadir,"/",datafilenames[filecounter],sep="") aggfxdata<-clnaggcompcurrencyfile(fxfile=datafile,aggminutes=a ggminutes, fillholes=1) logbidask<-log(aggfxdata[,"bidask"]) aggfxdata<-cbind(aggfxdata,logbidask) if ( start == 1 ) { fullaggfxdata<-aggfxdata start<-0 } else { fullaggfxdata<-rbind(fullaggfxdata,aggfxdata) } } #============================================================= ========== ================================== -------------------------------------------------------- This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -------------------------------------------------------- This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments,...{{dropped}}
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20061208/7421e22b/attachment-0005.pl