Identifying and Removing NA Columns and factor Columns with more than x Levels
Hi Bert, Thanks! These worked perfectly. mydata<-mydata[,!(i1)] Dan -----Original Message----- From: Bert Gunter [mailto:gunter.berton at gene.com] Sent: Thursday, August 30, 2012 8:54 AM To: Lopez, Dan Cc: R help (r-help at r-project.org) Subject: Re: [R] Identifying and Removing NA Columns and factor Columns with more than x Levels If d is your data frame i1 <- sapply(d,function(x)is.factor(x)&&length(levels(x))>31) ## a vector of length ncol(d) that is TRUE only for factor columns with >31 levels i2 >- sapply(d,function(x)any(is.na(x))) ## You can figure it out. -- Bert
On Thu, Aug 30, 2012 at 8:38 AM, Lopez, Dan <lopez235 at llnl.gov> wrote:
Hi,
How do you subset a dataframe so that you only have columns:
1. that contain one or more NAs?
2. that contain factors with greater than or equal to 32 levels?
How do you remove from a dataframe columns**
3. with one or more NA's?
4. that contain factors with greater than or equal to 32 levels?
** I know how to remove columns at a basic level but I am trying to figure out a more efficient way of performing these particular tasks (my data set has 60 columns).
For NA's I essentially used summary(mtcars) and manually made a note of where NA's appeared than used:
mtcars1<-mtcars1[,!(names(mtcars1)%in% c("hp","wt","vs"))] I did
something similar for factors with greater than x levels only I used str(mtcars) to help me identify them.
BTW I know mtcars doesn't have any of these issues. I just used it as a quick reference.
Dan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm