Skip to content

Better way of Grouping?

4 messages · Charles Determan Jr, arun, Jeff Newmiller +1 more

#
Hi,
You can also use grep() to subset:


LD<-paste0(rep(rep(c(3,4),each=4),2),c(rep("L",8),rep("D",8)))
set.seed(1)
dat1<-data.frame(LD=LD,value=sample(1:15,16,replace=TRUE))
dat2<-within(dat1,{LD<-as.character(LD)})
dat2[grepl(".*L",dat2$LD),] # subset all L values
dat2[grepl(".*D",dat2$LD),] # subset all D values
?dat2[grepl("3D",dat2$LD),]
dat2[grepl("4D",dat2$LD),]


A.K.




----- Original Message -----
From: Charles Determan Jr <deter088 at umn.edu>
To: r-help at r-project.org
Cc: 
Sent: Friday, September 28, 2012 2:59 PM
Subject: [R] Better way of Grouping?

Hello R users,

This is more of a convenience question that I hope others might find useful
if there is a better answer.? I work with large datasets that requires
multiple parsing stages for different analysis.? For example, compare group
3 vs. group 4.? A more complicated comparison would be time B in group 3 of
group L with B in group 4 of group L.? I normally subset each group with
the following type of code.

data=read(...)

#L v D
L=data[LvD %in% c("L"),]
D=data[LvD %in% c("D"),]

#Groups 3 and 4 within L and D
group3L=L[group %in% c("3"),]
group4L=L[group %in% c("3"),]

group3D=D[group %in% c("3"),]
group4D=D[group %in% c("3"),]

#Times B, S45, FR2, FR8
you get the idea


Is there a more efficient way to subset groups?? Thanks for any insight.

Regards,
Charles

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
You have not specified the objective function you are trying to optimize with your term "efficient", or what you do with all of these subsets once you have them. 

For notational simplification and completeness of coverage (not necessarily computational speedup) you might want to look at "tapply" or ddply/dlply from the plyr package. If you build lists of subsets you can index into them according to grouping value. You can use expand.grid to build all permutations of grouping values to use as indexes into those lists of subsets.

To reiterate, you have not indicated what you want to do with these subsets, so there could be special-purpose functions that do what you want.  As always, reproducible code leads to reproducible answers. :)
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
Charles Determan Jr <deter088 at umn.edu> wrote:

            
#
On Sep 28, 2012, at 11:59 AM, Charles Determan Jr wrote:

            
Assume you meant to have a "4" there
Ditto. Only makes sense with a "4".



The usual way is to use:

lapply( split(data, interaction(data$LvD, data$group)) ,
         fun( subdf) {<do something with subdf>} )

That way you do not end up littering you workspace with subsidiary subsets of you main data object.