Tip for performance improvement while handling huge data?
Ok. Thank you. As of now, vectorization option is feasible. Was not sure to handle this way. would try. Regards, Suresh
Philipp Pagel-5 wrote:
For certain calculations, I have to handle a dataframe with say 10 million rows and multiple columns of different datatypes. When I try to perform calculations on certain elements in each row, the program just goes in "busy" mode for really long time. To avoid this "busy" mode, I split the dataframe into subsets of 10000 rows. Then the calculation was done very fast. within reasonable time. Is there any other tip to improve the performance ?
Depending on what exactly it is you are doing and what causes the slowdown there may be a number of useful strategies: - Buy RAM (lots of it) - it's cheap - Vectorize whatever you are doing - Don't use all the data you have but draw a random sample of reasonalbe size - ... To be more helpful we'd have to know - what are the computations involved? - how are they implemented at the moment? -> example code - what is the range of "really long time"? cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
View this message in context: http://www.nabble.com/Tip-for-performance-improvement-while-handling-huge-data--tp21901287p21902758.html Sent from the R help mailing list archive at Nabble.com.