Do you use R for data manipulation?

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090506/60be4b72/attachment-0001.pl>
take a look at sqldf package(http://code.google.com/p/sqldf/), you
will be amazed.
Is R an appropriate tool for data manipulation and data reshaping and data
organizing? I think so but someone who recently joined our group thinks not.
The new recruit believes that python or another language is a far better
tool for developing data manipulation scripts that can be then used by
several members of our research group. Her assessment is that R is useful
only when it comes to data analysis and working with statistical models.
So what do you think:
1)R is a phenomenally powerful and flexible tool and since you are going to
do analyses in R you might as well use it to read data in and merge it and
reshape it to whatever you need.
OR
2) Are you crazy? Nobody in their right mind uses R to pipe the data around
their lab and assemble it for analysis.

Your insights would be appreciated.

Details if you are interested:

Our setup: Hundreds of patients recorded as cases with about 60 variables.
Inputted and stored in a Sybase relational database. High throughput SNP
genotyping platforms saved data output to csv or excel tables. Previously,
not knowing any SQL I had used Microsoft Access to write queries to get the
data that I needed and to merge the genotyping with the clinical database.
It was horrible. I could not even use it on anything other than my desktop
machine at work. When I realized that I was going to need to learn R to
handle the genetic analyses I decided to keep Sybase as the data repository
for the clinical information and the do all the data manipulation, merging
and piping with R using RODBC. I was and am a very amateur coder.
Nevertheless, many many hours later I have scripts that did what I needed
them to do and I understand R code and can tinker with it as needed. My
scripts work for me but they are not exactly user-friendly for others in the
laboratory to just run. For instance, depending on what machine the script
is being run from, one may need to change the file name or file path and
tinker under the hood to accomplish that. My bias is to fulfill all our data
manipulation and reshaping with R. Since I am the principal investigator it
is me who stays constant and coders or analysts who may come and go.

I am even more enamored with R for data manipulation since reading a book
about it.

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

==============================
WenSui Liu
Acquisition Risk, Chase
Blog   : statcompute.spaces.live.com

Tough Times Never Last. But Tough People Do.  - Robert Schuller
==============================
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090506/e9f438c8/attachment-0001.pl>
well, I am less proficient in R comparing with other tools/languages. 
Therefore my biased opinion is - it is possible in R, but it may be 
easier if you use other tools, especially if you have to build a 
user-friendly GUI.

The most accessible (although limited to MS Windows only) method would 
be building GUI with HTA (HTML Application)/javasript which is nearly 
the same as creating web page and calling R from there when necessary. 
Less limited, but steeper learning curve - Python, Perl, Tcl/Tk - all 
open source tools that can communicate with R and all have decent GUI 
building tools. Then proprietary Adobe Flex, Flash, Air (the later 
somehow resembles HTA) or Runtime Revolution (RR) all allow to easily 
build crossplatform eye-candies, but these are not free although not too 
expensive either if you can allocate some resources for your project. I 
usually hide all the command line utilities beyond GUIs built with RR. 
All the tools listed above can easily do any kind of data manipulation 
and reshaping, but each have its strong sides: Python - tidy object 
oriented syntax, tons of 3rd party modules, Perl - powerful regular 
expressions tons of modules, RR - database connectivity, chunk 
expressions (item, char, word, line, etc...) and syntax that makes data 
manipulation much much easier.

But I may be wrong, so please let me here ask another related question 
(new thread?..) for the group - what do you use to build graphical user 
interfaces for end-users of your tools in R?

All the best
Viktoras
Is R an appropriate tool for data manipulation and data reshaping and data
organizing? I think so but someone who recently joined our group thinks not.
The new recruit believes that python or another language is a far better
tool for developing data manipulation scripts that can be then used by
several members of our research group. Her assessment is that R is useful
only when it comes to data analysis and working with statistical models.
So what do you think:
1)R is a phenomenally powerful and flexible tool and since you are going to
do analyses in R you might as well use it to read data in and merge it and
reshape it to whatever you need.
OR
2) Are you crazy? Nobody in their right mind uses R to pipe the data around
their lab and assemble it for analysis.

Your insights would be appreciated.

Details if you are interested:

Our setup: Hundreds of patients recorded as cases with about 60 variables.
Inputted and stored in a Sybase relational database. High throughput SNP
genotyping platforms saved data output to csv or excel tables. Previously,
not knowing any SQL I had used Microsoft Access to write queries to get the
data that I needed and to merge the genotyping with the clinical database.
It was horrible. I could not even use it on anything other than my desktop
machine at work. When I realized that I was going to need to learn R to
handle the genetic analyses I decided to keep Sybase as the data repository
for the clinical information and the do all the data manipulation, merging
and piping with R using RODBC. I was and am a very amateur coder.
Nevertheless, many many hours later I have scripts that did what I needed
them to do and I understand R code and can tinker with it as needed. My
scripts work for me but they are not exactly user-friendly for others in the
laboratory to just run. For instance, depending on what machine the script
is being run from, one may need to change the file name or file path and
tinker under the hood to accomplish that. My bias is to fulfill all our data
manipulation and reshaping with R. Since I am the principal investigator it
is me who stays constant and coders or analysts who may come and go.

I am even more enamored with R for data manipulation since reading a book
about it.

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Sorry for reply to the wrong person, I lost the original email.
Farrel Buchinsky wrote:
Is R an appropriate tool for data manipulation and data reshaping and data
organizing? I think so but someone who recently joined our group thinks 
not.
The new recruit believes that python or another language is a far better
tool for developing data manipulation scripts that can be then used by
several members of our research group. Her assessment is that R is useful
only when it comes to data analysis and working with statistical models.

I personally started to use R because I got tired of manually writing scripts
for data manipulation and processing.  The argument of your new recruit smells
of ignorance and resistance to learning something new.  Ask her _how_ did she
assess R, how much time she spent on her assessment and whether did she
actually try to run it and perform some concrete simple tasks.

(Yes, R is somewhat "different", it has a steep learning curve, but the effort
of learning it is worth it.  And yes, R can be used in the same way as any
other scripting language, i.e., it is not restricted to interactive work.)

Take a look at plyr and reshape packages (http://had.co.nz/), I have a hunch
that they would have saved me a lot of headache had I found out about them
earlier :)

I would also recommend investing in Phil Spector's book "Data manipulation with
R", it will get you started much faster.

I also find R's image files very convenient for sharing data (and code!) in a
very compact format (single file, portable across architectures).  When you
quit your R session, all the variables and functions get saved in the image
file, which you can take with you (or send to somebody else), start R again,
load the image into a new session and continue from where you left.  You won't
get this kind of automatic persistence in any scripting language out of the
box.
So what do you think:
1)R is a phenomenally powerful and flexible tool and since you are going to
do analyses in R you might as well use it to read data in and merge it and
reshape it to whatever you need.
OR
2) Are you crazy? Nobody in their right mind uses R to pipe the data around
their lab and assemble it for analysis.
I'd go with 1).  R has also interfaces towards databases through RODBC, so you
do not have to go through several conversions when you're about to process or
plot data in R.
Le mercredi 06 mai 2009 ? 00:22 -0400, Farrel Buchinsky a ?crit :
Is R an appropriate tool for data manipulation and data reshaping and data
organizing? 
[ Large Snip ! ... ]

Depends on what you have to do.

I've done what can be more or less termed "data management" with almost
uncountable tools (from Excel (sigh...) to R with SQL, APL, Pascal, C,
Basic (in 1982 !), Fortran and even Lisp in passing...).

SQL has strong points : join is, to my tastes, more easily expressed in
SQL than in most languages, projection and aggregation are natural.

However, in SQL, there is no "natural" ordering of row tables, which
makes expressing algorithms using this order difficult. Try for example
to express the differences of a time series ... (it can be done, but it
is *not* a pretty sight).

On the other hand, R has some unique expressive possibilities (reshape()
comes to mind).

So I tend to use a combination of tools : except for very small samples,
I tend to manage my data in SQL and with associated tools (think data
editing, for example ; a simple form in OpenOffice's Base is quite easy
to create, can handle anything for which an ODBC driver exists, and
won't crap out for more than a few hundreds line...). finer manipulation
is usually done in R with  native tools and sqldf.

But, at least in my trade, the ability to handle Excel files is a must
(this is considered as a standard for data entry. Sigh ...). So the
first task is usually a) import data in an SQL database, and b) prepare
some routines to dump SQL tables / R dataframes in Excel tor returning
back to the original data author...

HTH

					Emmanuel Charpentier
Is R an appropriate tool for data manipulation and data reshaping and data
organizing? I think so but someone who recently joined our group thinks not.
The new recruit believes that python or another language is a far better
tool for developing data manipulation scripts that can be then used by
several members of our research group.
I happily use both approaches depending on the original format the
data come in:

For data that are not in a "well behaved" format and require actual
parsing, I tend to use Python scripts for transmogrifying the data
into nice and tidy tables (and maybe some very basic filtering). For
everything after that I prefer R. I also use Python if the relevant
data needs to be harvested and assembled from many differnt sources
(e.g. data files + web + databases).

Once the data files are easy to read (csv, tab separated, database,
...) and the task is to reshape, filter and clean the data, I usually
do it in R. R has true advantages here: 

 - After reading a table into a data frame I can immediatly tell, if all
   measurements are what they are supposed to be (integer, numeric,
   factor, boolean) and functions like read.table even do quite some
   error checking for me (equal number of columns etc.)

 - Finding out if factors have the right (or plausible) number of levels is easy

 - Filtering by logical indexing

 - Powerful and reliable reshaping (reshape package)

 - Very conveniant diagnostics: str(), dim(), table(), summary(),
   plotting the data in various ways, ...

cu
	Philipp
Dr. Philipp Pagel
Lehrstuhl f?r Genomorientierte Bioinformatik
Technische Universit?t M?nchen
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://mips.gsf.de/staff/pagel
I also use the approach Philipp describes below.  I use Python and shell 
scripts for processing thousands of input files and getting all the data 
into one tidy csv table.  From that point onwards it's R all the way 
(often with the reshape package).

Paul
On Wed, May 06, 2009 at 12:22:45AM -0400, Farrel Buchinsky wrote:
Is R an appropriate tool for data manipulation and data reshaping and data
organizing? I think so but someone who recently joined our group thinks not.
The new recruit believes that python or another language is a far better
tool for developing data manipulation scripts that can be then used by
several members of our research group.

I happily use both approaches depending on the original format the
data come in:

For data that are not in a "well behaved" format and require actual
parsing, I tend to use Python scripts for transmogrifying the data
into nice and tidy tables (and maybe some very basic filtering). For
everything after that I prefer R. I also use Python if the relevant
data needs to be harvested and assembled from many differnt sources
(e.g. data files + web + databases).

Once the data files are easy to read (csv, tab separated, database,
...) and the task is to reshape, filter and clean the data, I usually
do it in R. R has true advantages here: 

 - After reading a table into a data frame I can immediatly tell, if all
   measurements are what they are supposed to be (integer, numeric,
   factor, boolean) and functions like read.table even do quite some
   error checking for me (equal number of columns etc.)

 - Finding out if factors have the right (or plausible) number of levels is easy

 - Filtering by logical indexing

 - Powerful and reliable reshaping (reshape package)

 - Very conveniant diagnostics: str(), dim(), table(), summary(),
   plotting the data in various ways, ...

cu
	Philipp

I second what Zeljko wrote.  In addition, see the data manipulation 
section in Chapter 4 of 
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RS/sintro.pdf

Frank
Sorry for reply to the wrong person, I lost the original email.

Farrel Buchinsky wrote:
Is R an appropriate tool for data manipulation and data reshaping and data
organizing? I think so but someone who recently joined our group thinks 
not.
The new recruit believes that python or another language is a far better
tool for developing data manipulation scripts that can be then used by
several members of our research group. Her assessment is that R is useful
only when it comes to data analysis and working with statistical models.
I personally started to use R because I got tired of manually writing scripts
for data manipulation and processing.  The argument of your new recruit smells
of ignorance and resistance to learning something new.  Ask her _how_ did she
assess R, how much time she spent on her assessment and whether did she
actually try to run it and perform some concrete simple tasks.

(Yes, R is somewhat "different", it has a steep learning curve, but the effort
of learning it is worth it.  And yes, R can be used in the same way as any
other scripting language, i.e., it is not restricted to interactive work.)

Take a look at plyr and reshape packages (http://had.co.nz/), I have a hunch
that they would have saved me a lot of headache had I found out about them
earlier :)

I would also recommend investing in Phil Spector's book "Data manipulation with
R", it will get you started much faster.

I also find R's image files very convenient for sharing data (and code!) in a
very compact format (single file, portable across architectures).  When you
quit your R session, all the variables and functions get saved in the image
file, which you can take with you (or send to somebody else), start R again,
load the image into a new session and continue from where you left.  You won't
get this kind of automatic persistence in any scripting language out of the
box.

So what do you think:
1)R is a phenomenally powerful and flexible tool and since you are going to
do analyses in R you might as well use it to read data in and merge it and
reshape it to whatever you need.
OR
2) Are you crazy? Nobody in their right mind uses R to pipe the data around
their lab and assemble it for analysis.
I'd go with 1).  R has also interfaces towards databases through RODBC, so you
do not have to go through several conversions when you're about to process or
plot data in R.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University
Take a look at plyr and reshape packages (http://had.co.nz/), I have a hunch
that they would have saved me a lot of headache had I found out about them
earlier :)
As the author of these two packages, I'm admittedly biased, but I
think R is unparalleled for data preparation, manipulation, and
cleaning (with the small caveat that your data needs to fit in
memory).  The R data frame is a fantastic abstraction that most other
programming languages lack, and vectorised subscripting make it
possible to express many transformations in an elegant and efficient
manner.  On top of the facilities provided by base R, there are a huge
number of packages available to load data from just about every data
format, as well as a number of packages (plyr, reshape, sqldf, doBy,
gdata, scope, ...) for data manipulation - just pick the metaphor that
is most natural to you.

Hadley
http://had.co.nz/
In my opinion, no statisticians toolbox should contain only 1 tool (even if it is as amazing a tool as R).  Learning the different tools helps you appreciate when each are the most appropriate to use and learn different ways of looking at problems.   There are some tasks that I (it could easily differ for others) find quickest to do some data extraction using Perl, then load the results into R.

Having said the above, I do admit that the percentage of time that I spend using tools other than R for working with data has gone down quite a bit with time.  3 possible reasons:

1. my clients are getting better at giving me the data in appropriate forms
2. my proficiency with R continues to grow and I can better see how to do something using R
3. R continues to grow with more and more tools to help manage data.

And a possible 4th: 4. I am getting to lazy in my old age to switch to other programs.

While I like to think that I am having success at educating my clients, number 1 only contributes very little to the overall, 3 is definitely a big contributor and hopefully 2 is part of the reason as well.
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Farrel Buchinsky
> Sent: Tuesday, May 05, 2009 10:23 PM
> To: R
> Cc: Ross; gregory_warnes at urmc.rochester.edu; greg at warnes.net
> Subject: [R] Do you use R for data manipulation?
> 
> Is R an appropriate tool for data manipulation and data reshaping and
> data
> organizing? I think so but someone who recently joined our group thinks
> not.
> The new recruit believes that python or another language is a far
> better
> tool for developing data manipulation scripts that can be then used by
> several members of our research group. Her assessment is that R is
> useful
> only when it comes to data analysis and working with statistical
> models.
> So what do you think:
> 1)R is a phenomenally powerful and flexible tool and since you are
> going to
> do analyses in R you might as well use it to read data in and merge it
> and
> reshape it to whatever you need.
> OR
> 2) Are you crazy? Nobody in their right mind uses R to pipe the data
> around
> their lab and assemble it for analysis.
> 
> Your insights would be appreciated.
> 
> Details if you are interested:
> 
> Our setup: Hundreds of patients recorded as cases with about 60
> variables.
> Inputted and stored in a Sybase relational database. High throughput
> SNP
> genotyping platforms saved data output to csv or excel tables.
> Previously,
> not knowing any SQL I had used Microsoft Access to write queries to get
> the
> data that I needed and to merge the genotyping with the clinical
> database.
> It was horrible. I could not even use it on anything other than my
> desktop
> machine at work. When I realized that I was going to need to learn R to
> handle the genetic analyses I decided to keep Sybase as the data
> repository
> for the clinical information and the do all the data manipulation,
> merging
> and piping with R using RODBC. I was and am a very amateur coder.
> Nevertheless, many many hours later I have scripts that did what I
> needed
> them to do and I understand R code and can tinker with it as needed. My
> scripts work for me but they are not exactly user-friendly for others
> in the
> laboratory to just run. For instance, depending on what machine the
> script
> is being run from, one may need to change the file name or file path
> and
> tinker under the hood to accomplish that. My bias is to fulfill all our
> data
> manipulation and reshaping with R. Since I am the principal
> investigator it
> is me who stays constant and coders or analysts who may come and go.
> 
> I am even more enamored with R for data manipulation since reading a
> book
> about it.
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
Another tool I find useful is Matthew Dowle's data.table package. It
has very fast indexing, can have much lower memory requirements than a
data frame, and has some built-in data manipulation capability.
Especially with a 64-bit OS, you can use this to keep things in memory
where you otherwise would have to use a database.

See here: http://article.gmane.org/gmane.comp.lang.r.packages/282

- Tom
+1. I worked with Matthew for a while and saw in practice just how 
powerful that package is.
I'm surprised it isn't more widely used.

Martin
Another tool I find useful is Matthew Dowle's data.table package. It
has very fast indexing, can have much lower memory requirements than a
data frame, and has some built-in data manipulation capability.
Especially with a 64-bit OS, you can use this to keep things in memory
where you otherwise would have to use a database.

See here: http://article.gmane.org/gmane.comp.lang.r.packages/282

- Tom

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

2009/5/6 Emmanuel Charpentier <charpent at bacbuc.dyndns.org>:
Le mercredi 06 mai 2009 ? 00:22 -0400, Farrel Buchinsky a ?crit :
Is R an appropriate tool for data manipulation and data reshaping and data
organizing?
[ Large Snip ! ... ]

Depends on what you have to do.

I've done what can be more or less termed "data management" with almost
uncountable tools (from Excel (sigh...) to R with SQL, APL, Pascal, C,
Basic (in 1982 !), Fortran and even Lisp in passing...).
SQL has strong points : join is, to my tastes, more easily expressed in
SQL than in most languages, projection and aggregation are natural.

However, in SQL, there is no "natural" ordering of row tables, which
makes expressing algorithms using this order difficult. Try for example
to express the differences of a time series ... (it can be done, but it
is *not* a pretty sight).

On the other hand, R has some unique expressive possibilities (reshape()
comes to mind).

So I tend to use a combination of tools : except for very small samples,
I tend to manage my data in SQL and with associated tools (think data
editing, for example ; a simple form in OpenOffice's Base is quite easy
to create, can handle anything for which an ODBC driver exists, and
won't crap out for more than a few hundreds line...). finer manipulation
is usually done in R with ?native tools and sqldf.
But, at least in my trade, the ability to handle Excel files is a must
(this is considered as a standard for data entry. Sigh ...). So the
first task is usually a) import data in an SQL database, and b) prepare
some routines to dump SQL tables / R dataframes in Excel tor returning
back to the original data author...
I don't think Excel is  a standard tool for data entry. Epidata entry
is much more professional.
HTH

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Emmanuel Charpentier

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

HUANG Ronggui, Wincent
PhD Candidate
Dept of Public and Social Administration
City University of Hong Kong
Home page: http://asrr.r-forge.r-project.org/rghuang.html
I am not a statistician and not a computer scientist by education. I
consider myself an R novice and came to R - thanks to my boss - from
an SPSS background. I work for a market research company and the most
typical data files we deal with are not huge - up to several thousand
rows and up to a thousand variables.
I would say, on certain projects, most of what we do in R (if you look
at the number of lines in R we devote to a given task) is data
manipulation. The actual statistical method is frequently just a line
- all the rest is getting the data shaped right and then spitting out
the results of the analysis in a way that is usable (i.e.,
presentable).
I find R to be excellent in data manipulations that we perform. First
of all, it's great that you can always grab variables/cases you need
and ignore all the rest. In SPSS you just keep staring at all those
data and variables that you don't need - trying to find the one you
need.
Second - I find R to be incredibly fast (as opposed to SPSS or Excel)
with the amounts of data we are dealing with.
And third - nothing is "written in stone" and your original data is
always untouched - you can always read it in again and again. For
example, if I create a new variable and make a mistake, I can always
fix the code, rerun that piece of the code and that gives me the
corrected object that containes that new variable. I never touch the
original data and hence - never "spoil" it.

Dimitri
2009/5/6 Emmanuel Charpentier <charpent at bacbuc.dyndns.org>:
Le mercredi 06 mai 2009 ? 00:22 -0400, Farrel Buchinsky a ?crit :
Is R an appropriate tool for data manipulation and data reshaping and data
organizing?
[ Large Snip ! ... ]

Depends on what you have to do.

I've done what can be more or less termed "data management" with almost
uncountable tools (from Excel (sigh...) to R with SQL, APL, Pascal, C,
Basic (in 1982 !), Fortran and even Lisp in passing...).

SQL has strong points : join is, to my tastes, more easily expressed in
SQL than in most languages, projection and aggregation are natural.

However, in SQL, there is no "natural" ordering of row tables, which
makes expressing algorithms using this order difficult. Try for example
to express the differences of a time series ... (it can be done, but it
is *not* a pretty sight).

On the other hand, R has some unique expressive possibilities (reshape()
comes to mind).

So I tend to use a combination of tools : except for very small samples,
I tend to manage my data in SQL and with associated tools (think data
editing, for example ; a simple form in OpenOffice's Base is quite easy
to create, can handle anything for which an ODBC driver exists, and
won't crap out for more than a few hundreds line...). finer manipulation
is usually done in R with ?native tools and sqldf.

But, at least in my trade, the ability to handle Excel files is a must
(this is considered as a standard for data entry. Sigh ...). So the
first task is usually a) import data in an SQL database, and b) prepare
some routines to dump SQL tables / R dataframes in Excel tor returning
back to the original data author...
I don't think Excel is ?a standard tool for data entry. Epidata entry
is much more professional.

HTH

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Emmanuel Charpentier

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
HUANG Ronggui, Wincent
PhD Candidate
Dept of Public and Social Administration
City University of Hong Kong
Home page: http://asrr.r-forge.r-project.org/rghuang.html

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dimitri Liakhovitski
MarketTools, Inc.
Dimitri.Liakhovitski at markettools.com
Le lundi 11 mai 2009 ? 23:20 +0800, ronggui a ?crit :

[ Snip... ]
But, at least in my trade, the ability to handle Excel files is a must
(this is considered as a standard for data entry. Sigh ...).
[ Re-snip... ]
I don't think Excel is  a standard tool for data entry. Epidata entry
is much more professional.
Irony squared ?

This *must* go in the fortunes file !
					Emmanuel Charpentier
Is R an appropriate tool for data manipulation and data reshaping and data
organizing? I think so but someone who recently joined our group thinks not.
The new recruit believes that python or another language is a far better
tool for developing data manipulation scripts that can be then used by
several members of our research group. Her assessment is that R is useful
only when it comes to data analysis and working with statistical models.
It's hard to shift people's individual preferences, but impressive 
objective comparisons are easy to come by.  Ask her how many lines it 
would take to do this trivial R task in Python:

	data <- read.csv('original-data.csv')
	write.csv('scaled-data.csv', data * 10)

R's ability to do something to an entire data structure -- or a slice of 
it, or some other subset -- in a single operation is very useful when 
cleaning up data for presentation and analysis.  Also point out how easy 
it is to get data *out* of R, as above, not just into it, so you can 
then hack on it in Python, if that's the better language for further 
manipulation.

If she gives you static about how a few more lines are no big deal, 
remind her that it's well established that bug count is always a simple 
function of line count.  This fact has been known since the 70's.

While making your points, remember that she has a good one, too: R is 
not the only good language out there.  You should learn Python while 
she's learning R.
Farrel Buchinsky wrote:
Is R an appropriate tool for data manipulation and data reshaping and
data
organizing? I think so but someone who recently joined our group
thinks not.
The new recruit believes that python or another language is a far better
tool for developing data manipulation scripts that can be then used by
several members of our research group. Her assessment is that R is
useful
only when it comes to data analysis and working with statistical models.
It's hard to shift people's individual preferences, but impressive
objective comparisons are easy to come by.  Ask her how many lines it
would take to do this trivial R task in Python:

    data <- read.csv('original-data.csv')
    write.csv('scaled-data.csv', data * 10)
you might want to learn that this is a question of appropriate
libraries.  in r, read.csv and write.csv reside in the package utils. 
in python, you'd use numpy:

    from numpy import loadtxt, savetxt
    savetxt('scaled.csv', loadtxt('original.csv', delimiter=',')*10,
delimiter=',')

this makes 2 lines, together with importing the library.
R's ability to do something to an entire data structure -- or a slice
of it, or some other subset -- in a single operation is very useful
when cleaning up data for presentation and analysis.  
but this is really *hardly* r-specific.  you can do that in many, many
languages, be assured.  just peek out.
Also point out how easy it is to get data *out* of R, as above, not
just into it, so you can then hack on it in Python, if that's the
better language for further manipulation.

If she gives you static about how a few more lines are no big deal,
remind her that it's well established that bug count is always a
simple function of line count.  This fact has been known since the 70's.
that's a slogan, esp. when you think of how compact (but unreadable, and
thus error-prone) can code written in perl be.  often, more lines of
code make it easier to maintain, and thus avoid bugs.
While making your points, remember that she has a good one, too: R is
not the only good language out there.  You should learn Python while
she's learning R.
+1