Selecting all rows of factors which have at least one positive value?

5 messages · Stephan Lindner, Nutter, Benjamin, Patrizio Frederic +2 more

Original

1

5

Stephan Lindner

Thu, Apr 2, 2009 8:26 AM #

Dear all,

I'm trying to select from a dataframe all rows which correspond to a
factor (the id variable) for which there exists at least one positive
value of a certain variable. As an example:

x <- data.frame(matrix(c(rep(11,4),rep(12,3),rep(13,3),rep(0,3),1,rep(0,4),rep(1,2)),ncol=2))

X1 X2
1  11  0
2  11  0
3  11  0
4  11  1
5  12  0
6  12  0
7  12  0
8  13  0
9  13  1
10 13  1 


and I want to select all rows pertaining to factor levels of X1 for
which exists at least one "1" for X2. To be clear, I want rows 1:4
(since there exists at least one observation for X1==11 for which
X2==1) and rows 8:10 (likewise). 

It is easy to obtain the corresponding factor levels (i.e.,
unique(x$X1[x$X2==1])), but I got stalled selecting the corresponding
rows. I tried grep, but then I have to loop and concatenate the
resulting vector. Any ideas?


Thanks a lot!


	Stephan

-----------------------
Stephan Lindner
University of Michigan

Nutter, Benjamin

Thu, Apr 2, 2009 8:35 AM #

x <-
data.frame(matrix(c(rep(11,4),rep(12,3),rep(13,3),rep(0,3),1,rep(0,4),re
p(1,2)),ncol=2))

id.keep <- unique(subset(x,X2>0)$X1)

x2 <- subset(x,X1 %in% id.keep)

x2

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Stephan Lindner
Sent: Thursday, April 02, 2009 11:26 AM
To: r-help at stat.math.ethz.ch
Subject: [R] Selecting all rows of factors which have at least one
positive value?

Dear all,

I'm trying to select from a dataframe all rows which correspond to a
factor (the id variable) for which there exists at least one positive
value of a certain variable. As an example:

x <-
data.frame(matrix(c(rep(11,4),rep(12,3),rep(13,3),rep(0,3),1,rep(0,4),re
p(1,2)),ncol=2))

X1 X2
1  11  0
2  11  0
3  11  0
4  11  1
5  12  0
6  12  0
7  12  0
8  13  0
9  13  1
10 13  1 


and I want to select all rows pertaining to factor levels of X1 for
which exists at least one "1" for X2. To be clear, I want rows 1:4
(since there exists at least one observation for X1==11 for which
X2==1) and rows 8:10 (likewise). 

It is easy to obtain the corresponding factor levels (i.e.,
unique(x$X1[x$X2==1])), but I got stalled selecting the corresponding
rows. I tried grep, but then I have to loop and concatenate the
resulting vector. Any ideas?


Thanks a lot!


	Stephan

-----------------------
Stephan Lindner
University of Michigan

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


===================================

P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S. News & World Report (2008).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use\...{{dropped:13}}

Patrizio Frederic

Thu, Apr 2, 2009 9:43 AM #

or the exactly equivalent form:

x[x$X1 %in% unique(x[x$X2>0,"X1"]), ]

Patrizio

2009/4/2 Nutter, Benjamin <NutterB at ccf.org>:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


===================================

P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S. News & World Report (2008).
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note: ?This message is intended for use\...{{dropped:13}}

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius

Thu, Apr 2, 2009 6:53 PM #

I think the unique function is superfluous:

 > x[x$X1 %in% x$X1[x$X2==1], ]
    X1 X2
1  11  0
2  11  0
3  11  0
4  11  1
8  13  0
9  13  1
10 13  1

--  

David Winsemius

On Apr 2, 2009, at 12:43 PM, Patrizio Frederic wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


===================================

P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S. News & World Report (2008).
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use\... 
{{dropped:13}}

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

Thu, Apr 2, 2009 7:34 PM #

Here's one way using plyr:

library(plyr)
ddply(x, "X1", subset, any(X2 == 1))

See http://had.co.nz/plyr for more details.

Hadley

http://had.co.nz/