Skip to content

Deleting specific rows from a dataframe

3 messages · Chirag Gupta, arun

#
Hi,
If I understand it correctly,
df1<- read.table(text="
sample1 sample2 sample3 sample4 sample5? 
?a P P I P P
?b P A P P A
?c P P P P P
?d P P P P P
?e M P M A P
?f P P P P P
?g P P P A P
?h P P P P P
",sep="",header=TRUE,stringsAsFactors=FALSE)
df1[rowSums(df1=="P")==ncol(df1),]
#? sample1 sample2 sample3 sample4 sample5
#c?????? P?????? P?????? P?????? P?????? P
#d?????? P?????? P?????? P?????? P?????? P
#f ????? P?????? P?????? P?????? P?????? P
#h?????? P?????? P?????? P?????? P?????? P
A.K.



----- Original Message -----
From: Chirag Gupta <cxg040 at email.uark.edu>
To: r-help at r-project.org
Cc: 
Sent: Monday, July 15, 2013 9:10 PM
Subject: [R] Deleting specific rows from a dataframe

I have a data frame like shown below

? sample1 sample2 sample3 sample4 sample5? a P P I P P? b P A P P A? c P P P
P P? d P P P P P? e M P M A P? f P P P P P? g P P P A P? h P P P P P

I want to keep only those rows which have all "P" across all the columns.

Since the matrix is large (about 20,000 rows), I cannot do it in excel

Any special function that i can use?
#
You mentioned data.frame at one place and matrix at another.? Matrix would be faster.

#Speed comparison
#Speed
set.seed(1454)
dfTest<- as.data.frame(matrix(sample(LETTERS[15:18],5*1e6,replace=TRUE),ncol=5))

system.time(res<-dfTest[rowSums(dfTest=="P")==ncol(dfTest),])
#?? user? system elapsed 
#? 0.628?? 0.020?? 0.649 
?dim(res)
#[1] 952?? 5


set.seed(1454)
mat1<- matrix(sample(LETTERS[15:18],5*1e6,replace=TRUE),ncol=5)
system.time(res1<-mat1[rowSums(mat1=="P")==ncol(mat1),])
# user? system elapsed 
#? 0.188?? 0.004?? 0.194 
dim(res1)
#[1] 952?? 5

#Other options include
system.time(res3<- dfTest[apply(sweep(dfTest,1,"P","=="),1,all),])
#?? user? system elapsed 
#? 5.988?? 0.120?? 6.120 
?identical(res,res3)
#[1] TRUE



system.time(res2<- dfTest[apply(dfTest,1, function(x) all(length(table(x))==ncol(dfTest) | names(table(x))=="P")? ), ])
#?? user? system elapsed 
#351.492?? 0.040 352.164?
row.names(res2)<- row.names(res3)
attr(res3,"row.names")<- attr(res2,"row.names")
?identical(res2,res3)
#[1] TRUE


A.K.

----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: Chirag Gupta <cxg040 at email.uark.edu>
Cc: R help <r-help at r-project.org>
Sent: Monday, July 15, 2013 9:23 PM
Subject: Re: [R] Deleting specific rows from a dataframe

Hi,
If I understand it correctly,
df1<- read.table(text="
sample1 sample2 sample3 sample4 sample5? 
?a P P I P P
?b P A P P A
?c P P P P P
?d P P P P P
?e M P M A P
?f P P P P P
?g P P P A P
?h P P P P P
",sep="",header=TRUE,stringsAsFactors=FALSE)
df1[rowSums(df1=="P")==ncol(df1),]
#? sample1 sample2 sample3 sample4 sample5
#c?????? P?????? P?????? P?????? P?????? P
#d?????? P?????? P?????? P?????? P?????? P
#f ????? P?????? P?????? P?????? P?????? P
#h?????? P?????? P?????? P?????? P?????? P
A.K.



----- Original Message -----
From: Chirag Gupta <cxg040 at email.uark.edu>
To: r-help at r-project.org
Cc: 
Sent: Monday, July 15, 2013 9:10 PM
Subject: [R] Deleting specific rows from a dataframe

I have a data frame like shown below

? sample1 sample2 sample3 sample4 sample5? a P P I P P? b P A P P A? c P P P
P P? d P P P P P? e M P M A P? f P P P P P? g P P P A P? h P P P P P

I want to keep only those rows which have all "P" across all the columns.

Since the matrix is large (about 20,000 rows), I cannot do it in excel

Any special function that i can use?