deleting columns from a dataframe where NA is more than 15 percent of the column length
Thank you.. It was very informative and helpful. It works Sent from my iPhone
On Aug 5, 2012, at 10:21 PM, arun <smartpink111 at yahoo.com> wrote:
HI,
Try this:
dat1<-data.frame(x=c(NA,NA,rnorm(6,15),NA),y=c(NA,rnorm(8,15)),z=c(rnorm(7,15),NA,NA))
dat1[which(colMeans(is.na(dat1))<=.15)]
y
1 NA
2 13.53085
3 12.89453
4 15.02625
5 14.00387
6 15.34618
7 15.69293
8 15.62377
9 14.76479
#You can also use apply, sapply etc.
dat2<-data.frame(x=c(NA,NA,rnorm(6,15),NA),y=c(NA,rnorm(8,15)),z=c(rnorm(7,15),NA,NA),u=c(rnorm(9,15)))
dat2[apply(dat2,2,function(x) mean(is.na(x))<=.15)]
#dat2[sapply(dat2,function(x) mean(is.na(x))<=.15)]
#dat2[which(colMeans(is.na(dat2))<=.15)]
y u
1 NA 14.56278
2 16.49940 16.25761
3 14.11368 14.08768
4 14.95139 14.01923
5 14.99517 15.91936
6 14.46359 14.07573
7 15.09702 13.94888
8 15.99967 14.97171
9 15.51924 15.59981
A.K.
----- Original Message -----
From: Faz Jones <jonesfaz4 at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Sunday, August 5, 2012 9:04 PM
Subject: [R] deleting columns from a dataframe where NA is more than 15 percent of the column length
I have a dataframe of 10 different columns (length of each column is
the same). I want to eliminate any column that has 'NA' greater than
15% of the column length. Do i first need to make a function for
calculating the percentage of NA for each column and then make another
dataframe where i apply the function? Whats the best way to do this.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.