An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130715/d0555b3d/attachment.pl>
Deleting specific rows from a dataframe
3 messages · Chirag Gupta, arun
Hi, If I understand it correctly, df1<- read.table(text=" sample1 sample2 sample3 sample4 sample5? ?a P P I P P ?b P A P P A ?c P P P P P ?d P P P P P ?e M P M A P ?f P P P P P ?g P P P A P ?h P P P P P ",sep="",header=TRUE,stringsAsFactors=FALSE) df1[rowSums(df1=="P")==ncol(df1),] #? sample1 sample2 sample3 sample4 sample5 #c?????? P?????? P?????? P?????? P?????? P #d?????? P?????? P?????? P?????? P?????? P #f ????? P?????? P?????? P?????? P?????? P #h?????? P?????? P?????? P?????? P?????? P A.K. ----- Original Message ----- From: Chirag Gupta <cxg040 at email.uark.edu> To: r-help at r-project.org Cc: Sent: Monday, July 15, 2013 9:10 PM Subject: [R] Deleting specific rows from a dataframe I have a data frame like shown below ? sample1 sample2 sample3 sample4 sample5? a P P I P P? b P A P P A? c P P P P P? d P P P P P? e M P M A P? f P P P P P? g P P P A P? h P P P P P I want to keep only those rows which have all "P" across all the columns. Since the matrix is large (about 20,000 rows), I cannot do it in excel Any special function that i can use?
*Chirag Gupta* ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
You mentioned data.frame at one place and matrix at another.? Matrix would be faster. #Speed comparison #Speed set.seed(1454) dfTest<- as.data.frame(matrix(sample(LETTERS[15:18],5*1e6,replace=TRUE),ncol=5)) system.time(res<-dfTest[rowSums(dfTest=="P")==ncol(dfTest),]) #?? user? system elapsed #? 0.628?? 0.020?? 0.649 ?dim(res) #[1] 952?? 5 set.seed(1454) mat1<- matrix(sample(LETTERS[15:18],5*1e6,replace=TRUE),ncol=5) system.time(res1<-mat1[rowSums(mat1=="P")==ncol(mat1),]) # user? system elapsed #? 0.188?? 0.004?? 0.194 dim(res1) #[1] 952?? 5 #Other options include system.time(res3<- dfTest[apply(sweep(dfTest,1,"P","=="),1,all),]) #?? user? system elapsed #? 5.988?? 0.120?? 6.120 ?identical(res,res3) #[1] TRUE system.time(res2<- dfTest[apply(dfTest,1, function(x) all(length(table(x))==ncol(dfTest) | names(table(x))=="P")? ), ]) #?? user? system elapsed #351.492?? 0.040 352.164? row.names(res2)<- row.names(res3) attr(res3,"row.names")<- attr(res2,"row.names") ?identical(res2,res3) #[1] TRUE A.K. ----- Original Message ----- From: arun <smartpink111 at yahoo.com> To: Chirag Gupta <cxg040 at email.uark.edu> Cc: R help <r-help at r-project.org> Sent: Monday, July 15, 2013 9:23 PM Subject: Re: [R] Deleting specific rows from a dataframe Hi, If I understand it correctly, df1<- read.table(text=" sample1 sample2 sample3 sample4 sample5? ?a P P I P P ?b P A P P A ?c P P P P P ?d P P P P P ?e M P M A P ?f P P P P P ?g P P P A P ?h P P P P P ",sep="",header=TRUE,stringsAsFactors=FALSE) df1[rowSums(df1=="P")==ncol(df1),] #? sample1 sample2 sample3 sample4 sample5 #c?????? P?????? P?????? P?????? P?????? P #d?????? P?????? P?????? P?????? P?????? P #f ????? P?????? P?????? P?????? P?????? P #h?????? P?????? P?????? P?????? P?????? P A.K. ----- Original Message ----- From: Chirag Gupta <cxg040 at email.uark.edu> To: r-help at r-project.org Cc: Sent: Monday, July 15, 2013 9:10 PM Subject: [R] Deleting specific rows from a dataframe I have a data frame like shown below ? sample1 sample2 sample3 sample4 sample5? a P P I P P? b P A P P A? c P P P P P? d P P P P P? e M P M A P? f P P P P P? g P P P A P? h P P P P P I want to keep only those rows which have all "P" across all the columns. Since the matrix is large (about 20,000 rows), I cannot do it in excel Any special function that i can use?
*Chirag Gupta* ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.