Skip to content

Do you use R for data manipulation?

6 messages · John Kane, Laura Arsanto, Simon Pickett +3 more

#
--- On Wed, 5/6/09, Farrel Buchinsky <fjbuch at gmail.com> wrote:

            
I only do small scale projects and am by no means a programmer. Isn't Perl something for earings?

That said, I find R to be extremely useful at data manipulation and have used it exclusively in my last three projects.  The different data structures alone are worth their weight in gold, if for nothing else than making it harder to make stupid mistakes in coding.
Any reason that she thinks this?  How well does she know R?  It is not exactly a language that one picks up in a week, especially if one is coming from using a stats package like SAS or SPSS. As an ex-SAS and SYSTAT user it took me weeks to just get comfortable with the power of subscripting and the ability to do all kinds of calculations "in-line".
Definately. I am not a computer scientist or a statistician. I usually am working as a single contractor and normally with small datasets as part of a larger project.  R does what I want, usually very elegantly (albeit perhaps after a lot of headbanging and calls for help to the R-list) and it would be stupid for me to use more than one language when it is not needed.  

Another plus is that I can  easily leave my data analysis work and a working copy of R with the client.  He/she may have a problem seeing what I did but it is clearly readable & replicable by either the client or another consultant.
Well I don't work in a lab but why complicate things? If everyone is using the same tools then you have a good situation.  Others who do work in labs can address this point more cogently
__________________________________________________________________
Make your browsing faster, safer, and easier with the new Internet Explorer? 8. Optimized for Yahoo! Get it Now for Free! at http://downloads.yahoo.com/ca/internetexplorer/
#
My institute uses SAS religiously, I am the only R "heathen".

I have resisted learning to use SAS because I dont see the point after years 
of using R and I like being able to do everything using one program. 
However, my colleagues maintain that SAS is "better" for programming without 
really ever giving me a good reason why other than memory issues.

dont want to hi-jack the thread but would be interested in hearing some 
other views, especially since my organisation spends (wastes?) alot of money 
every year on SAS licences...

Simon.

----- Original Message ----- 
From: "Laura Arsanto" <ghina84 at hotmail.it>
To: <jrkrideau at yahoo.ca>; <r-help at stat.math.ethz.ch>; <fjbuch at gmail.com>
Cc: <ross.lazarus at gmail.com>; <gregory_warnes at urmc.rochester.edu>; 
<greg at warnes.net>
Sent: Wednesday, May 06, 2009 2:53 PM
Subject: Re: [R] Do you use R for data manipulation?




I used R for my master thesis (with big effort, anyway) and now I find 
difficult to use R in my daily work, becasue it has really serious problems 
with datasets of big dimension, both in the data manipulation step and in 
the analysis step.

But I really would love to use it, as I like its transparence, compared to 
other software.

Laura

***********
_________________________________________________________________
[[elided Hotmail spam]]





--------------------------------------------------------------------------------
#
There is a book on data manipulation using R.

Data manipulation with R.

http://www.springer.com/statistics/computational/book/978-0-387-74730-9

It highlighted how comprehensive the data manipulation capabilities of R can be.

Regards,

CH
On Wed, May 6, 2009 at 10:01 PM, Simon Pickett <simon.pickett at bto.org> wrote:

  
    
#
I work in cognitive science where we collect one or more data files
per participant in an experiment then merge those files to perform
subsequent analyses. Sometimes some files are in wide format and
others are in long format, necessitating reshaping. I've found R
entirely satisfactory for this.*

Indeed, I would be wary of an approach that attempts data manipulation
*outside* of R as I'm of the "raw data in, results out" school of
thought that it's dangerous to isolate your data manipulation from
your record of analysis. If you leave your raw data files untouched
and perform all manipulation & analysis in one system (R) then there
is a complete record of what's happened to the data from start to
finish and it's easier to catch/correct errors.

The reshape package is great for reshaping between long & wide data
formats, and the ply package is great for computing summary statistics
within cells of the design.

Mike

*note: I typically use Python for data collection (showing visual
stimuli, recording responses, etc), but have it spit out raw text
files of the trial-by-trial data, and thus use it for only a bare
minimum of processing.

--
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar

~ Certainty is folly... I think. ~
#
Simon Pickett wrote:
Put quite simply, your colleagues' opinions are humbug.

Frank