Merging rows in dataframes
I've added an example to FAQ 3 on the home page that illustrates group_concat. http://sqldf.googlecode.com On Wed, Mar 25, 2009 at 9:06 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
In the Links box to the right on the sqldf home page click on "SQLite - aggregate functions" and lookup group_concat. On Wed, Mar 25, 2009 at 9:05 AM, Schragi Schwartz <schragas at post.tau.ac.il> wrote:
Thank you, your answer was extremely helpful. One last problem though: one
of the aggregate functions I'd like to apply on the columns is
concatentation (equivalent to the paste() function). So if I have a given
character column in three separate rows sharing the same ids with the value
"apple" in the first, "banana" in the second, and "orange" in the third, in
the summarizing row I'd like to receive output in the form
"apple|banana|orange". Is there any way to do this?
Thanks again,
Schragi
-----Original Message-----
From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
Sent: Tuesday, March 24, 2009 12:50 AM
To: Schraga Schwartz
Cc: r-help at r-project.org
Subject: Re: [R] Merging rows in dataframes
Using sqldf you only need two statements, infile <- file(...) and
DF <- sqldf("select min(a), max(b), mean(c), ... from infile group by id").
The file statement identifies the filename and the second reads it
into sqlite (without
going through R), summarizes it and then reads the summarized version
into R. ?You may also need to provide info on its format if its not in the
default format. ?See example 4a on home page and the other examples
there:
http://sqldf.googlecode.com
On Mon, Mar 23, 2009 at 5:58 PM, Schraga Schwartz
<schragas at post.tau.ac.il> wrote:
Hello, I have a dataframe with 40 columns and around 450,000 rows. The first
column
in each row is a factor id and the remaining are numeric. Some rows have
the
same ids. What I want to do is to merge each set of rows sharing the same ids (id set) into one single row (summarizing row) with that id. To create the summarizing row, I'd like to apply a different function on each of the original columns in the id set. Some columns within the summarizing row
will
equal the mean of the columns in the id set, others will equal the
minimum,
others the maximum. To do this, I tried using the by() function. However, this was extremely slow (it ran for more than two hours before I stopped it). Also, it used
up
all of 16 GB of memory on my machine. Is there any more efficient
function,
both in terms of time and memory, to do this sort of thing? Thank you very much, Schraga Schwartz
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.