Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Farrel Buchinsky
> Sent: Tuesday, May 05, 2009 10:23 PM
> To: R
> Cc: Ross; gregory_warnes at urmc.rochester.edu; greg at warnes.net
> Subject: [R] Do you use R for data manipulation?
>
> Is R an appropriate tool for data manipulation and data reshaping and
> data
> organizing? I think so but someone who recently joined our group thinks
> not.
> The new recruit believes that python or another language is a far
> better
> tool for developing data manipulation scripts that can be then used by
> several members of our research group. Her assessment is that R is
> useful
> only when it comes to data analysis and working with statistical
> models.
> So what do you think:
> 1)R is a phenomenally powerful and flexible tool and since you are
> going to
> do analyses in R you might as well use it to read data in and merge it
> and
> reshape it to whatever you need.
> OR
> 2) Are you crazy? Nobody in their right mind uses R to pipe the data
> around
> their lab and assemble it for analysis.
>
> Your insights would be appreciated.
>
> Details if you are interested:
>
> Our setup: Hundreds of patients recorded as cases with about 60
> variables.
> Inputted and stored in a Sybase relational database. High throughput
> SNP
> genotyping platforms saved data output to csv or excel tables.
> Previously,
> not knowing any SQL I had used Microsoft Access to write queries to get
> the
> data that I needed and to merge the genotyping with the clinical
> database.
> It was horrible. I could not even use it on anything other than my
> desktop
> machine at work. When I realized that I was going to need to learn R to
> handle the genetic analyses I decided to keep Sybase as the data
> repository
> for the clinical information and the do all the data manipulation,
> merging
> and piping with R using RODBC. I was and am a very amateur coder.
> Nevertheless, many many hours later I have scripts that did what I
> needed
> them to do and I understand R code and can tinker with it as needed. My
> scripts work for me but they are not exactly user-friendly for others
> in the
> laboratory to just run. For instance, depending on what machine the
> script
> is being run from, one may need to change the file name or file path
> and
> tinker under the hood to accomplish that. My bias is to fulfill all our
> data
> manipulation and reshaping with R. Since I am the principal
> investigator it
> is me who stays constant and coders or analysts who may come and go.
>
> I am even more enamored with R for data manipulation since reading a
> book
> about it.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.