imputing the numerical columns of a dataframe, returning the rest unchanged
Hi,
?sapply will tell you
....
'sapply' is a user-friendly version of 'lapply' by default
returning a vector or matrix if appropriate.
....
so 'x' has lost its class in sapply(); e.g.
## iris is a data.frame
str(iris)
'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... ## but sapply() will coerce it into a numeric matrix
str(sapply(iris, function(x)x))
num [1:150, 1:5] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:5] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" ... I'd suggest you get the class of each column first, then apply impute() to these columns (i.e. DF[, sapply(DF, class) == "numeric"]) and assign the new values to the original columns. Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China
On Mon, Dec 22, 2008 at 11:38 PM, Mark Heckmann <mark.heckmann at gmx.de> wrote:
Hi R-experts,
how can I apply a function to each numeric column of a data frame and return
the whole data frame with changes in numeric columns only?
In my case I want to do a median imputation of the numeric columns and
retain the other columns. My dataframe (DF) contains factors, characters and
numerics.
I tried the following but that does not work:
foo <- function(x){
if(is.numeric(x)==TRUE) return(impute(x))
else(return(x))
}
sapply(DF, foo)
day version ID V1 V2 V3
[1,] "4" "A" "1a" "1" "5" "5"
[2,] "4" "A" "2a" "2" "3" "5"
[3,] "4" "B" "3a" "3" "5" "5"
All the variables are coerced to characters now ("day" and "version" were
factors, "id" a character). I only want imputations on the numerics, but the
rest to be returned unchanged.
Is there a function available. If not, how can I do it?
TIA and merry x-mas,
Mark
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.