Skip to content

Reshaping dataframes

4 messages · Ingmar Schuster, Rui Barradas, David Winsemius

#
Hello,

Your function doesn't seem to be very difficult to generalize.

d <- read.table(text="
    trg_type child_type_1
1 Scientists NA
2        of         used
", header=TRUE)
str(d)

subs_na <- function(tok, na_factor_level = "NOT_REALIZED", na_num = 99999) {
     ifac <- which(sapply(tok, is.factor))
     inum <- which(sapply(tok, is.numeric))
     for(i in ifac) {
         levels(tok[, i]) <- c(levels(tok[, i]), na_factor_level)
         tok[is.na(tok[, i]), i] <- as.factor(na_factor_level)
     }
     for(i in inum)
         tok[is.na(tok[, i]), i] <- na_num
     tok
}

r1 <- substitute_na(d)
r2 <- subs_na(d)
str(r1)
str(r2)
identical(r1, r2)  # TRUE

You could use the same coding for characters, Dates, etc.

Hope this helps,

Rui Barradas

Em 22-08-2012 20:16, Ingmar Schuster escreveu:
#
On Aug 23, 2012, at 2:02 AM, Ingmar Schuster wrote:

            
Not sure what you mean by " _while_ binding dataframes" but the  
original question seems answered by this sentence from the help file  
for factor:

"For a numeric x, set exclude=NULL to make NA an extra level (prints  
as <NA>); by default, this is the last level."

fac <- factor(fac, exclude=NULL) # would skip all that `is.na()`,  
`level=` gymnastics


If you want to loop over factor dataframe columns:

facidx <-  sapply(d, is.factor)
d[ ,facidx ] <- lapply( d[ , facidx ], factor, exclude=NULL)

I see no parameters to data.frame or read.table that would allow  
specifying different than the default behavior for factor().