I have a dataframe of 10 different columns (length of each column is the same). I want to eliminate any column that has 'NA' greater than 15% of the column length. Do i first need to make a function for calculating the percentage of NA for each column and then make another dataframe where i apply the function? Whats the best way to do this.
deleting columns from a dataframe where NA is more than 15 percent of the column length
5 messages · Jorge I Velez, arun, Faz Jones
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120805/5bf2bd22/attachment.pl>
HI, Try this: dat1<-data.frame(x=c(NA,NA,rnorm(6,15),NA),y=c(NA,rnorm(8,15)),z=c(rnorm(7,15),NA,NA)) dat1[which(colMeans(is.na(dat1))<=.15)]??? ???? y 1?????? NA 2 13.53085 3 12.89453 4 15.02625 5 14.00387 6 15.34618 7 15.69293 8 15.62377 9 14.76479 #You can also use apply, sapply etc. dat2<-data.frame(x=c(NA,NA,rnorm(6,15),NA),y=c(NA,rnorm(8,15)),z=c(rnorm(7,15),NA,NA),u=c(rnorm(9,15))) dat2[apply(dat2,2,function(x) mean(is.na(x))<=.15)]? #dat2[sapply(dat2,function(x) mean(is.na(x))<=.15)] #dat2[which(colMeans(is.na(dat2))<=.15)] ?????? y??????? u 1?????? NA 14.56278 2 16.49940 16.25761 3 14.11368 14.08768 4 14.95139 14.01923 5 14.99517 15.91936 6 14.46359 14.07573 7 15.09702 13.94888 8 15.99967 14.97171 9 15.51924 15.59981 A.K. ----- Original Message ----- From: Faz Jones <jonesfaz4 at gmail.com> To: r-help at r-project.org Cc: Sent: Sunday, August 5, 2012 9:04 PM Subject: [R] deleting columns from a dataframe where NA is more than 15 percent of the column length I have a dataframe of 10 different columns (length of each column is the same). I want to eliminate any column that has 'NA' greater than 15% of the column length. Do i first need to make a function for calculating the percentage of NA for each column and then make another dataframe where i apply the function? Whats the best way to do this. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120806/164094a7/attachment.pl>
Thank you.. It was very informative and helpful. It works Sent from my iPhone
On Aug 5, 2012, at 10:21 PM, arun <smartpink111 at yahoo.com> wrote:
HI,
Try this:
dat1<-data.frame(x=c(NA,NA,rnorm(6,15),NA),y=c(NA,rnorm(8,15)),z=c(rnorm(7,15),NA,NA))
dat1[which(colMeans(is.na(dat1))<=.15)]
y
1 NA
2 13.53085
3 12.89453
4 15.02625
5 14.00387
6 15.34618
7 15.69293
8 15.62377
9 14.76479
#You can also use apply, sapply etc.
dat2<-data.frame(x=c(NA,NA,rnorm(6,15),NA),y=c(NA,rnorm(8,15)),z=c(rnorm(7,15),NA,NA),u=c(rnorm(9,15)))
dat2[apply(dat2,2,function(x) mean(is.na(x))<=.15)]
#dat2[sapply(dat2,function(x) mean(is.na(x))<=.15)]
#dat2[which(colMeans(is.na(dat2))<=.15)]
y u
1 NA 14.56278
2 16.49940 16.25761
3 14.11368 14.08768
4 14.95139 14.01923
5 14.99517 15.91936
6 14.46359 14.07573
7 15.09702 13.94888
8 15.99967 14.97171
9 15.51924 15.59981
A.K.
----- Original Message -----
From: Faz Jones <jonesfaz4 at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Sunday, August 5, 2012 9:04 PM
Subject: [R] deleting columns from a dataframe where NA is more than 15 percent of the column length
I have a dataframe of 10 different columns (length of each column is
the same). I want to eliminate any column that has 'NA' greater than
15% of the column length. Do i first need to make a function for
calculating the percentage of NA for each column and then make another
dataframe where i apply the function? Whats the best way to do this.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.