Skip to content

Tip for performance improvement while handling huge data?

3 messages · Suresh_FSFM, Philipp Pagel

#
Hello All,

For certain calculations, I have to handle a dataframe with say 10 million
rows and multiple columns of different datatypes. 
When I try to perform calculations on certain elements in each row, the
program just goes in "busy" mode for really long time.
To avoid this "busy" mode, I split the dataframe into subsets of 10000 rows.
Then the calculation was done very fast. within reasonable time.

Is there any other tip to improve the performance ?

Regards,
Suresh
#
Depending on what exactly it is you are doing and what causes the slowdown
there may be a number of useful strategies:

 - Buy RAM (lots of it) - it's cheap
 - Vectorize whatever you are doing
 - Don't use all the data you have but draw a random sample of reasonalbe size
 - ...

To be more helpful we'd have to know

 - what are the computations involved?
 - how are they implemented at the moment?
  -> example code
 - what is the range of "really long time"?

cu
	Philipp
#
Ok. Thank you.
As of now, vectorization option is feasible. Was not sure to handle this
way. would try.

Regards,
Suresh
Philipp Pagel-5 wrote: