Skip to content

Does SQL group by have a heavy duty equivalent in R

7 messages · Farrel Buchinsky, Hadley Wickham, Charles C. Berry

#
The reshape package will work if all your data is numeric, or all of
it is character - it doesn't work with a mix.  I will try and make
this more clear in the documentation.
However, depending on the size and structure of your data it may not
be any faster than tapply or aggregate.

Hadley
#
I converted the whole data frame to character by using
as.matrix

And then using a posting that explained how to get the naming conventions 
back (which had been lost when converting to matrix)

Anything that I did not list with the id's it insisted in including them 
with the measured variables. In other words it would not let me drop.

despite

melted<-melt(BigDF, id=c("SAMPLE_ID","ASSAY_ID"), 
measured=c("GENOTYPE_ID","DESCRIPTION"))

unique(melted$variable)
 [1] CUSTOMER       PROJECT        PLATE          EXPERIMENT     CHIP 
WELL_POSITION  GENOTYPE_ID    DESCRIPTION    ENTRY_OPERATOR
[10] INTERACT       PLATEc
Levels: CUSTOMER PROJECT PLATE EXPERIMENT CHIP WELL_POSITION GENOTYPE_ID 
DESCRIPTION ENTRY_OPERATOR INTERACT PLATEc


I should have only got GENOTYPE_ID    and DESCRIPTION

"hadley wickham" <h.wickham at gmail.com> wrote in message 
news:f8e6ff050612310758p11f96c0dl256ac5b15d11dc2c at mail.gmail.com...
#
You shouldn't need to do that.
That should be measure=c(...)

Hadley
#
On Sun, 31 Dec 2006, Farrel Buchinsky wrote:

            
Why not use  duplicated() ?

For a data.frame with 200 rows of which about 50 are duplicates and 201 
columns finding the (non) duplicates takes little time on my year old AMD 
64 running Windows XP:
[1] 0.03 0.00 0.03   NA   NA
Finding the non-duplicated rows for which there is at least one 
replication:
[1] 0.05 0.00 0.05   NA   NA
Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717
#
On Sun, 31 Dec 2006, Charles C. Berry wrote:

            
More specifically:

 	unique( IDs[ duplicated( IDs ) & ! duplicated ( cbind (IDs, SNPs ) ) ] )

gives a list of those IDs for which the SNPs in all replicates of an ID 
are not the same.
Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717