An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120830/6d6ede80/attachment.pl>
Identifying and Removing NA Columns and factor Columns with more than x Levels
6 messages · Bert Gunter, arun, Lopez, Dan +2 more
If d is your data frame i1 <- sapply(d,function(x)is.factor(x)&&length(levels(x))>31) ## a vector of length ncol(d) that is TRUE only for factor columns with >31 levels i2 >- sapply(d,function(x)any(is.na(x))) ## You can figure it out. -- Bert
On Thu, Aug 30, 2012 at 8:38 AM, Lopez, Dan <lopez235 at llnl.gov> wrote:
Hi,
How do you subset a dataframe so that you only have columns:
1. that contain one or more NAs?
2. that contain factors with greater than or equal to 32 levels?
How do you remove from a dataframe columns**
3. with one or more NA's?
4. that contain factors with greater than or equal to 32 levels?
** I know how to remove columns at a basic level but I am trying to figure out a more efficient way of performing these particular tasks (my data set has 60 columns).
For NA's I essentially used summary(mtcars) and manually made a note of where NA's appeared than used:
mtcars1<-mtcars1[,!(names(mtcars1)%in% c("hp","wt","vs"))]
I did something similar for factors with greater than x levels only I used str(mtcars) to help me identify them.
BTW I know mtcars doesn't have any of these issues. I just used it as a quick reference.
Dan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Hi,
For the first part in the two questions, do this:
dat1<-data.frame(Temp=c(5,10,9,15,NA,14,25,21,24,23,21,24,35,35,36,34,32,33),Temp2=c(5,10,9,15,15,14,25,21,24,23,21,24,35,35,36,34,32,33),Month=rep(c("January","February","March","April","May","June"),each=3),Roof=as.factor(rep(1:6,times=3)))
?dat1[,colMeans(is.na(dat1))!=0]
dat1[,colMeans(is.na(dat1))==0]
#or
?dat1[,complete.cases(t(dat1))]
#Second part of two questions: In your case, it is 32.
?dat1[unlist(lapply(dat1,function(x) length(levels(x))>=4))]
or,
dat1[sapply(dat1,function(x) length(levels(x))>=4)]
#and
?dat1[sapply(dat1,function(x) length(levels(x))<4)]
I guess you wanted this as separate solutions.?
A.K.
----- Original Message -----
From: "Lopez, Dan" <lopez235 at llnl.gov>
To: "R help (r-help at r-project.org)" <r-help at r-project.org>
Cc:
Sent: Thursday, August 30, 2012 11:38 AM
Subject: [R] Identifying and Removing NA Columns and factor Columns with more than x Levels
Hi,
How do you subset a dataframe so that you only have columns:
1.? ? ? that contain one or more NAs?
2.? ? ? that contain factors with greater than or equal to 32 levels?
How do you remove from a dataframe columns**
3.? ? ? with one or more NA's?
4.? ? ? that contain factors with greater than or equal to 32 levels?
** I know how to remove columns at a basic level but I am trying to figure out a more efficient way of performing these particular tasks (my data set has 60 columns).
For NA's I essentially used summary(mtcars) and manually made a note of where NA's appeared than used:
mtcars1<-mtcars1[,!(names(mtcars1)%in% c("hp","wt","vs"))]
I did something similar for factors with greater than x levels only I used str(mtcars) to help me identify them.
BTW I know mtcars doesn't have any of these issues. I just used it as a quick reference.
Dan
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hi Bert, Thanks! These worked perfectly. mydata<-mydata[,!(i1)] Dan -----Original Message----- From: Bert Gunter [mailto:gunter.berton at gene.com] Sent: Thursday, August 30, 2012 8:54 AM To: Lopez, Dan Cc: R help (r-help at r-project.org) Subject: Re: [R] Identifying and Removing NA Columns and factor Columns with more than x Levels If d is your data frame i1 <- sapply(d,function(x)is.factor(x)&&length(levels(x))>31) ## a vector of length ncol(d) that is TRUE only for factor columns with >31 levels i2 >- sapply(d,function(x)any(is.na(x))) ## You can figure it out. -- Bert
On Thu, Aug 30, 2012 at 8:38 AM, Lopez, Dan <lopez235 at llnl.gov> wrote:
Hi,
How do you subset a dataframe so that you only have columns:
1. that contain one or more NAs?
2. that contain factors with greater than or equal to 32 levels?
How do you remove from a dataframe columns**
3. with one or more NA's?
4. that contain factors with greater than or equal to 32 levels?
** I know how to remove columns at a basic level but I am trying to figure out a more efficient way of performing these particular tasks (my data set has 60 columns).
For NA's I essentially used summary(mtcars) and manually made a note of where NA's appeared than used:
mtcars1<-mtcars1[,!(names(mtcars1)%in% c("hp","wt","vs"))] I did
something similar for factors with greater than x levels only I used str(mtcars) to help me identify them.
BTW I know mtcars doesn't have any of these issues. I just used it as a quick reference.
Dan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Hello everyone,
a hopefully easy to solve problem from an R novice...
I try to calculate a number of correlation matrices that finally should be combined in a three-dimensional array.
Here the my code with an R dataset as an example.
-----------------------------------
## Creation an array of correlation matrices from a rolling window application
TS <- EuStockMarkets
# Load internal dataset
n <- 30
# Choose size of rolling time window
T <- c(1:nrow(TS))
# Define number of steps
X <- array(data = NA, dim = c(ncol(TS), ncol(TS), nrow(TS)))
# Create data array
for (t in T[1:(length(T)-n)]){
X[t] = cor(TS[t:(t+n), 1:ncol(TS)], use = "pairwise.complete.obs")
}
# Calculate correlation matrices
---------------------------------
Unfortunately, I only get a warning that the dimensions do not fit... Where is the mistake?
THANKS A LOT!
Nico
Hello,
You create a 3d array X and then index it as if it were 1d.
Correction:
TS <- EuStockMarkets
[...etc...]
for (t in T[1:(length(T)-n)]){
X[ , , t] <- cor(TS[t:(t+n), 1:ncol(TS)], use = "pairwise.complete.obs")
}
# Calculate correlation matrices
Also, 't' and 'T' are not good names, the first is R's matrix transpose
function and the second one is another name for TRUE.
Hope this helps,
Rui Barradas
Em 31-08-2012 17:24, Max Frisch escreveu:
Hello everyone,
a hopefully easy to solve problem from an R novice...
I try to calculate a number of correlation matrices that finally should be combined in a three-dimensional array.
Here the my code with an R dataset as an example.
-----------------------------------
## Creation an array of correlation matrices from a rolling window application
TS <- EuStockMarkets
# Load internal dataset
n <- 30
# Choose size of rolling time window
T <- c(1:nrow(TS))
# Define number of steps
X <- array(data = NA, dim = c(ncol(TS), ncol(TS), nrow(TS)))
# Create data array
for (t in T[1:(length(T)-n)]){
X[t] = cor(TS[t:(t+n), 1:ncol(TS)], use = "pairwise.complete.obs")
}
# Calculate correlation matrices
---------------------------------
Unfortunately, I only get a warning that the dimensions do not fit... Where is the mistake?
THANKS A LOT!
Nico
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.