Tip for performance improvement while handling huge data?

Ok. Thank you.
As of now, vectorization option is feasible. Was not sure to handle this
way. would try.

Regards,
Suresh

For certain calculations, I have to handle a dataframe with say 10
million
rows and multiple columns of different datatypes. 
When I try to perform calculations on certain elements in each row, the
program just goes in "busy" mode for really long time.
To avoid this "busy" mode, I split the dataframe into subsets of 10000
rows.
Then the calculation was done very fast. within reasonable time.

Is there any other tip to improve the performance ?
Depending on what exactly it is you are doing and what causes the slowdown
there may be a number of useful strategies:

 - Buy RAM (lots of it) - it's cheap
 - Vectorize whatever you are doing
 - Don't use all the data you have but draw a random sample of reasonalbe
size
 - ...

To be more helpful we'd have to know

 - what are the computations involved?
 - how are they implemented at the moment?
  -> example code
 - what is the range of "really long time"?

cu
	Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl f?r Genomorientierte Bioinformatik
Technische Universit?t M?nchen
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://mips.gsf.de/staff/pagel

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

View this message in context: http://www.nabble.com/Tip-for-performance-improvement-while-handling-huge-data--tp21901287p21902758.html
Sent from the R help mailing list archive at Nabble.com.

Tip for performance improvement while handling huge data?

Thread (3 messages)