An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110119/1fc1da2f/attachment.pl>
expand.grid
3 messages · Berwin A Turlach, Nick Sabbe
G'day Nick, On Wed, 19 Jan 2011 09:43:56 +0100
"Nick Sabbe" <nick.sabbe at ugent.be> wrote:
Given a dataframe
dfr<-data.frame(c1=c("a", "b", NA, "a", "a"), c2=c("d", NA, "d", "e",
"e"), c3=c("g", "h", "i", "j", "k"))
I would like to have a dataframe with all (unique) combinations of
all the factors present.
Easy: R> expand.grid(lapply(dfr, levels)) c1 c2 c3 1 a d g 2 b d g 3 a e g 4 b e g 5 a d h 6 b d h 7 a e h 8 b e h 9 a d i 10 b d i 11 a e i 12 b e i 13 a d j 14 b d j 15 a e j 16 b e j 17 a d k 18 b d k 19 a e k 20 b e k
In fact, I would like a simple solution for these two cases: given the three factor columns above, I would like both all _possible_ combinations of the factor levels, and all _present_ combinations of the factor levels (e.g. if I would do this for the first 4 rows of dfr, it would contain no combinations with c3="k").
R> dfrpart <- lapply(dfr[1:4,], factor) R> expand.grid(lapply(dfrpart, levels)) c1 c2 c3 1 a d g 2 b d g 3 a e g 4 b e g 5 a d h 6 b d h 7 a e h 8 b e h 9 a d i 10 b d i 11 a e i 12 b e i 13 a d j 14 b d j 15 a e j 16 b e j
It would also be nice to be able to choose whether or not NA's are included.
R> expand.grid(lapply(dfrpart, function(x) c(levels(x),
+ if(any(is.na(x))) NA else NULL)))
c1 c2 c3
1 a d g
2 b d g
3 <NA> d g
4 a e g
5 b e g
6 <NA> e g
7 a <NA> g
8 b <NA> g
9 <NA> <NA> g
10 a d h
11 b d h
....
HTH.
Cheers,
Berwin
========================== Full address ============================
Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019) +61 (8) 6488 3383 (self)
The University of Western Australia FAX : +61 (8) 6488 1028
35 Stirling Highway
Crawley WA 6009 e-mail: berwin at maths.uwa.edu.au
Australia http://www.maths.uwa.edu.au/~berwin
<slaps self in forehead/>
I appear to have misinterpreted the help: considering that it explicitly
makes note of factors, I wrongly assumed that it would use the levels of a
factor automatically. My bad.
For completeness' sake, my final solution:
getLevels<-function(vec, includeNA=FALSE, onlyOccurring=FALSE)
{
if(onlyOccurring)
{
rv<-levels(factor(vec))
}
else
{
rv<-levels(vec)
}
#cat("levels so far: ", rv, "\n")
if(includeNA && any(is.na(vec)))
{
rv<-c(rv,NA)
}
#cat("levels with na: ", rv, "\n")
return(rv)
}
expand.combs<-function(dfr, includeNA=FALSE, onlyOccurring=FALSE)
{
expand.grid(lapply(dfr, getLevels, includeNA, onlyOccurring))
}
Thx.
Nick Sabbe
--
ping: nick.sabbe at ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36
-- Do Not Disapprove
-----Original Message-----
From: Berwin A Turlach [mailto:berwin at maths.uwa.edu.au]
Sent: woensdag 19 januari 2011 11:04
To: Nick Sabbe
Cc: r-help at r-project.org
Subject: Re: [R] expand.grid
G'day Nick,
On Wed, 19 Jan 2011 09:43:56 +0100
"Nick Sabbe" <nick.sabbe at ugent.be> wrote:
Given a dataframe
dfr<-data.frame(c1=c("a", "b", NA, "a", "a"), c2=c("d", NA, "d", "e",
"e"), c3=c("g", "h", "i", "j", "k"))
I would like to have a dataframe with all (unique) combinations of
all the factors present.
Easy: R> expand.grid(lapply(dfr, levels)) c1 c2 c3 1 a d g 2 b d g 3 a e g 4 b e g 5 a d h 6 b d h 7 a e h 8 b e h 9 a d i 10 b d i 11 a e i 12 b e i 13 a d j 14 b d j 15 a e j 16 b e j 17 a d k 18 b d k 19 a e k 20 b e k
In fact, I would like a simple solution for these two cases: given the three factor columns above, I would like both all _possible_ combinations of the factor levels, and all _present_ combinations of the factor levels (e.g. if I would do this for the first 4 rows of dfr, it would contain no combinations with c3="k").
R> dfrpart <- lapply(dfr[1:4,], factor) R> expand.grid(lapply(dfrpart, levels)) c1 c2 c3 1 a d g 2 b d g 3 a e g 4 b e g 5 a d h 6 b d h 7 a e h 8 b e h 9 a d i 10 b d i 11 a e i 12 b e i 13 a d j 14 b d j 15 a e j 16 b e j
It would also be nice to be able to choose whether or not NA's are included.
R> expand.grid(lapply(dfrpart, function(x) c(levels(x),
+ if(any(is.na(x))) NA else NULL)))
c1 c2 c3
1 a d g
2 b d g
3 <NA> d g
4 a e g
5 b e g
6 <NA> e g
7 a <NA> g
8 b <NA> g
9 <NA> <NA> g
10 a d h
11 b d h
....
HTH.
Cheers,
Berwin
========================== Full address ============================
Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019) +61 (8) 6488 3383 (self)
The University of Western Australia FAX : +61 (8) 6488 1028
35 Stirling Highway
Crawley WA 6009 e-mail: berwin at maths.uwa.edu.au
Australia http://www.maths.uwa.edu.au/~berwin