Do you use R for data manipulation?
I second what Zeljko wrote. In addition, see the data manipulation section in Chapter 4 of http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RS/sintro.pdf Frank
Zeljko Vrba wrote:
Sorry for reply to the wrong person, I lost the original email.
Farrel Buchinsky wrote:
Is R an appropriate tool for data manipulation and data reshaping and data organizing? I think so but someone who recently joined our group thinks not. The new recruit believes that python or another language is a far better tool for developing data manipulation scripts that can be then used by several members of our research group. Her assessment is that R is useful only when it comes to data analysis and working with statistical models.
I personally started to use R because I got tired of manually writing scripts for data manipulation and processing. The argument of your new recruit smells of ignorance and resistance to learning something new. Ask her _how_ did she assess R, how much time she spent on her assessment and whether did she actually try to run it and perform some concrete simple tasks. (Yes, R is somewhat "different", it has a steep learning curve, but the effort of learning it is worth it. And yes, R can be used in the same way as any other scripting language, i.e., it is not restricted to interactive work.) Take a look at plyr and reshape packages (http://had.co.nz/), I have a hunch that they would have saved me a lot of headache had I found out about them earlier :) I would also recommend investing in Phil Spector's book "Data manipulation with R", it will get you started much faster. I also find R's image files very convenient for sharing data (and code!) in a very compact format (single file, portable across architectures). When you quit your R session, all the variables and functions get saved in the image file, which you can take with you (or send to somebody else), start R again, load the image into a new session and continue from where you left. You won't get this kind of automatic persistence in any scripting language out of the box.
So what do you think: 1)R is a phenomenally powerful and flexible tool and since you are going to do analyses in R you might as well use it to read data in and merge it and reshape it to whatever you need. OR 2) Are you crazy? Nobody in their right mind uses R to pipe the data around their lab and assemble it for analysis.
I'd go with 1). R has also interfaces towards databases through RODBC, so you do not have to go through several conversions when you're about to process or plot data in R.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University