Skip to content

R Help

2 messages · Mª Teresa Martinez Soriano, PIKAL Petr

#
Hi

It would be better if you provided either str(yourdata) or dput(yourdata) 

(or a part illustrating those 2 kinds of missing values)

Anyway I would use NA for missing and some other identifier for empty.

temp
   a      b  c
1  1  empty   
2 NA filled xx
3  2 filled xx

is.na(temp)
         a     b     c
[1,] FALSE FALSE FALSE
[2,]  TRUE FALSE FALSE
[3,] FALSE FALSE FALSE

dput(temp)
structure(list(a = c(1L, NA, 2L), b = structure(c(1L, 2L, 2L), .Label = c("empty", 
"filled"), class = "factor"), c = structure(c(1L, 2L, 2L), .Label = c("", 
"xx"), class = "factor")), .Names = c("a", "b", "c"), class = "data.frame", row.names = c(NA, 
-3L))

str(temp)
'data.frame':   3 obs. of  3 variables:
 $ a: int  1 NA 2
 $ b: Factor w/ 2 levels "empty","filled": 1 2 2
 $ c: Factor w/ 2 levels "","xx": 1 2 2

The only real NA value which can be used for imputation is in first column.

Regards
Petr