Skip to content
Prev 173792 / 398503 Next

What is the best package for large data cleaning (not statistical analysis)?

Hi Sean,

you should think about storing the data externally in a sql database.  
this makes you very flexible and you can do a lot of manipultaion  
directly in the db. with the help of stored procedures for example in  
a postgreSQL db you can use almost any preferred languege to  
manipulate the data before loading it into R. there's also a  
procedural language based on R with which you can do a lot of things  
already inside postgresql databases.

and keep in mind: learning sql isn't more difficult than R.

best,

josuah


Am 15.03.2009 um 13:13 schrieb Sean Zhang: