Skip to content
Prev 180479 / 398525 Next

help with as.numeric

On May 15, 2009, at 6:57 AM, deanj2k wrote:

            
That 'height' is a factor suggests that you imported the data using  
one of the read.table() family of functions and that there are non- 
numeric characters in at least one of the entries in that column.

Since 'height' is a factor, if you use as.numeric(), you will get  
numeric values returned that are the factor level numeric codes and  
not the expected numeric values. That is why you are getting bad  
values for BMI.

See:

   http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f


If you use something like:

   grep("[^0-9\\.]", height, value = TRUE)

that should show you where you have non-numeric values in the 'height'  
column. That is, entries for 'height' that contain characters other  
than numeric or a decimal. Foe example:

height <- factor(c(seq(0, 1, 0.1), "1,10", letters[1:5]))

 > height
  [1] 0    0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1    1,10 a     
b    c    d    e
Levels: 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1,10 a b c d e

 > as.numeric(height)
  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17

 > grep("[^0-9\\.]", height, value = TRUE)
[1] "1,10" "a"    "b"    "c"    "d"    "e"


I would also check the 'weight' column for the same reasons, to be  
sure that you don't have bad data there. Another approach would be to  
use:

   str(subjects)

which will give you a sense of the data types for each column in your  
data frame. Review each column and take note of any columns that  
should be numeric, but are factors.

See ?str, ?grep and ?regex for more information. You might also want  
to look at ?type.convert, which is the function used by the  
read.table() family of functions to determine the data types for each  
column during import.

HTH,

Marc Schwartz