Skip to content

Identifying and Removing NA Columns and factor Columns with more than x Levels

6 messages · Bert Gunter, arun, Lopez, Dan +2 more

#
If d is your data frame

i1 <- sapply(d,function(x)is.factor(x)&&length(levels(x))>31)
## a vector of length ncol(d) that is TRUE only for factor columns
with >31 levels

i2 >- sapply(d,function(x)any(is.na(x)))
## You can figure it out.

-- Bert
On Thu, Aug 30, 2012 at 8:38 AM, Lopez, Dan <lopez235 at llnl.gov> wrote:

  
    
#
Hi,
For the first part in the two questions, do this:
dat1<-data.frame(Temp=c(5,10,9,15,NA,14,25,21,24,23,21,24,35,35,36,34,32,33),Temp2=c(5,10,9,15,15,14,25,21,24,23,21,24,35,35,36,34,32,33),Month=rep(c("January","February","March","April","May","June"),each=3),Roof=as.factor(rep(1:6,times=3))) 

?dat1[,colMeans(is.na(dat1))!=0]
dat1[,colMeans(is.na(dat1))==0]
#or
?dat1[,complete.cases(t(dat1))]

#Second part of two questions: In your case, it is 32.
?dat1[unlist(lapply(dat1,function(x) length(levels(x))>=4))]
or,
dat1[sapply(dat1,function(x) length(levels(x))>=4)]

#and
?dat1[sapply(dat1,function(x) length(levels(x))<4)]

I guess you wanted this as separate solutions.? 
A.K.

----- Original Message -----
From: "Lopez, Dan" <lopez235 at llnl.gov>
To: "R help (r-help at r-project.org)" <r-help at r-project.org>
Cc: 
Sent: Thursday, August 30, 2012 11:38 AM
Subject: [R] Identifying and Removing NA Columns and factor Columns with more than x Levels

Hi,

How do you subset a dataframe so that you only have columns:

1.? ? ?  that contain one or more NAs?

2.? ? ?  that contain factors with greater than or equal to 32 levels?

How do you remove from a dataframe columns**

3.? ? ?  with one or more NA's?

4.? ? ?  that contain factors with greater than or equal to 32 levels?

** I know how to remove columns at a basic level but I am trying to figure out a more efficient way of performing these particular tasks (my data set has 60 columns).
For NA's I essentially used summary(mtcars) and manually made a note of where NA's appeared than used:
mtcars1<-mtcars1[,!(names(mtcars1)%in% c("hp","wt","vs"))]
I did something similar for factors with greater than x levels only I used str(mtcars) to help me identify them.
BTW I know mtcars doesn't have any of these issues. I just used it as a quick reference.


Dan


??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Hi Bert,

Thanks! These worked perfectly.

mydata<-mydata[,!(i1)]

Dan

-----Original Message-----
From: Bert Gunter [mailto:gunter.berton at gene.com] 
Sent: Thursday, August 30, 2012 8:54 AM
To: Lopez, Dan
Cc: R help (r-help at r-project.org)
Subject: Re: [R] Identifying and Removing NA Columns and factor Columns with more than x Levels

If d is your data frame

i1 <- sapply(d,function(x)is.factor(x)&&length(levels(x))>31)
## a vector of length ncol(d) that is TRUE only for factor columns with >31 levels

i2 >- sapply(d,function(x)any(is.na(x)))
## You can figure it out.

-- Bert
On Thu, Aug 30, 2012 at 8:38 AM, Lopez, Dan <lopez235 at llnl.gov> wrote:

  
    
#
Hello everyone,

a hopefully easy to solve problem from an R novice...

I try to calculate a number of correlation matrices that finally should be combined in a three-dimensional array.
Here the my code with an R dataset as an example.

-----------------------------------

## Creation an array of correlation matrices from a rolling window application

TS <- EuStockMarkets
# Load internal dataset

n <- 30
# Choose size of rolling time window

T <- c(1:nrow(TS))
# Define number of steps

X <- array(data = NA, dim = c(ncol(TS), ncol(TS), nrow(TS)))
# Create data array

for (t in T[1:(length(T)-n)]){
 X[t] = cor(TS[t:(t+n), 1:ncol(TS)], use = "pairwise.complete.obs")
}
# Calculate correlation matrices

---------------------------------

Unfortunately, I only get a warning that the dimensions do not fit... Where is the mistake?

THANKS A LOT!

Nico
#
Hello,

You create a 3d array X and then index it as if it were 1d.
Correction:

TS <- EuStockMarkets

[...etc...]

for (t in T[1:(length(T)-n)]){
  X[ , , t] <- cor(TS[t:(t+n), 1:ncol(TS)], use = "pairwise.complete.obs")
}
# Calculate correlation matrices


Also, 't' and 'T' are not good names, the first is R's matrix transpose 
function and the second one is another name for TRUE.

Hope this helps,

Rui Barradas

Em 31-08-2012 17:24, Max Frisch escreveu: