Skip to content
Prev 311418 / 398513 Next

Removing columns that are na or constant

Hello,

Inline.
Em 20-11-2012 22:03, Brian Feeny escreveu:
Suppose they do. If you now remove those columns from one of train_data 
or test_data, and not from the other, then their structures are no 
longer the same.
Or write a function. I would have the function return the indices of the 
good columns and then intersect the results for train_data and test_data.

notSame <- function(dataset){
     same <- sapply(dataset, function(.col){
         all(is.na(.col))  || all(.col[1L] == .col)
     })
     which(!same)
}

good1 <- notSame(train_data)
good2 <- notSame(test_data)
dataset <- dataset[intersect(good1, good2)]


Now you can sample from a "safe" subset of your dataset.
Only you can tell whether it's sound to eliminate variables from your 
analysis, and which ones.

Hope this helps,

Rui Barradas