Skip to content

How to convert a factor column into a numeric one?

8 messages · Robert A. LaBudde, Jorge Ivan Velez, Dennis Murphy +2 more

#
I have a data frame:

 > head(df)
   Time Temp Conc Repl    Log10
1    0  -20    H    1 6.406547
2    2  -20    H    1 5.738683
3    7  -20    H    1 5.796394
4   14  -20    H    1 4.413691
5    0    4    H    1 6.406547
7    7    4    H    1 5.705433
 > str(df)
'data.frame':   177 obs. of  5 variables:
  $ Time : Factor w/ 4 levels "0","2","7","14": 1 2 3 4 1 3 4 1 3 4 ...
  $ Temp : Factor w/ 4 levels "-20","4","25",..: 1 1 1 1 2 2 2 3 3 3 ...
  $ Conc : Factor w/ 3 levels "H","L","M": 1 1 1 1 1 1 1 1 1 1 ...
  $ Repl : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
  $ Log10: num  6.41 5.74 5.8 4.41 6.41 ...
 > levels(df$Temp)
[1] "-20" "4"   "25"  "45"
 > levels(df$Time)
[1] "0"  "2"  "7"  "14"

As you can see, "Time" and "Temp" are currently factors, not numeric.

I would like to change these columns into numerical ones.

df$Time<- as.numeric(df$Time)

doesn't work, as it changes to the factor level indices (1,2,3,4) 
instead of the values (0,2,7,14).

There must be a direct way of doing this in R.

I tried recode() in 'car':

 > df$Temp<- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE)
 > head(df)
   Time Temp Conc Repl     Freq
1    0  -20    H    1 6.406547
2    2  -20    H    1 5.738683
3    7  -20    H    1 5.796394
4   14  -20    H    1 4.413691
5    0   45    H    1 6.406547
7    7   45    H    1 5.705433

but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, 
as expected, although the result is numeric. The same happens if I 
use the order given by levels(df$Temp) instead of the sort order in 
the recode() 2nd argument.

Any hints?
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
#
Hi:

Try this:
+                  b = factor(rep(rep(1:2, each = 2), 5)),
+                  y = rnorm(20))
'data.frame':   20 obs. of  3 variables:
 $ a: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ...
 $ b: Factor w/ 2 levels "1","2": 1 1 2 2 1 1 2 2 1 1 ...
 $ y: num  0.6396 1.467 1.8403 -0.0915 0.2711 ...
+          a <- as.numeric(as.character(a))
+          b <- as.numeric(as.character(b))
+        } )
'data.frame':   20 obs. of  3 variables:
 $ a: num  1 1 1 1 2 2 2 2 3 3 ...
 $ b: num  1 1 2 2 1 1 2 2 1 1 ...
 $ y: num  0.6396 1.467 1.8403 -0.0915 0.2711 ...


HTH,
Dennis
On Sat, Jun 4, 2011 at 9:31 PM, Robert A. LaBudde <ral at lcfltd.com> wrote:
#
Hi Robert,

Try this:

## Example data converting mtcars to factors
testdf <- as.data.frame(lapply(mtcars, factor))
str(testdf)

## taking advantage of assignment methods to avoid an explicit call to
as.data.frame
## convert factor to numeric using the technique recommended in ?factor
testdf[] <- lapply(testdf, function(x)
  as.numeric(levels(x))[x])
str(testdf)


If you do not want to convert all columns, just use a subset.  Here is one way:

testdf[, c("mpg", "cyl", "disp")] <-
  lapply(testdf[, c("mpg", "cyl", "disp")],
  function(x) as.numeric(levels(x))[x])

I would also look into *why* those numeric columns are being stored as
factors in the first place.  If you are reading the data in with
read.table() or one of its wrapper functions (like read.csv), then it
would be better to preempt the storage as a factor altogether rather
than converting back to numeric.  For example, perhaps something is
being used to indicate missing data that R does not recognize (e.g.,
SAS uses ".").  Specifying na.strings = ".", would fix this.  See
?read.table for some of the options available.

Hope this helps,

Josh
On Sat, Jun 4, 2011 at 9:31 PM, Robert A. LaBudde <ral at lcfltd.com> wrote:

  
    
#
Exactly! Thanks.
At 12:49 AM 6/5/2011, Jorge Ivan Velez wrote:
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
#
Thanks for your help.

As far as your question below is concerned, the data frame arose as a 
result of some data cleaning on an original data frame, which was 
changed into a table, modified, and changed back to a data frame:

ttcrmean<- as.table(by(ngbe[,'Log10'], 
list(Time=ngbe$Time,Temp=ngbe$Temp,Conc=ngbe$Conc,Repl=ngbe$Replicate),
   mean))
for (k in 1:3) {  #fix-up time zeroes
   for (l in 1:5) { #replicates
     t0val<- ttcrmean[1,3,k,l]
     for (j in 1:4) {  #temps
       ttcrmean[1,j,k,l]<- t0val
     } #j
   } #l
} #i
df<- na.omit(as.data.frame(ttcrmean))
colnames(df)[5]<- 'Log10'
At 12:51 AM 6/5/2011, Joshua Wiley wrote:
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
#
Thanks! Exactly what I wanted, as the same as Jorge also suggested.
At 12:49 AM 6/5/2011, Dennis Murphy wrote:
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
#
Hmm, that is a bit tricky.  The conversion from a table to a data
frame uses the dimension names, which are always character.  To bypass
this, you would need to save the dimension names, convert the ones you
want numeric to numeric (I am assuming everything except Conc, so the
indices would be c(1, 2, 4)), and then manually convert from table to
data frame (but that is not too difficult).

In your case I am not sure there is a big benefit one way or the
other, but if you do it the way you have been and then convert the
data back to numeric, if you use:

df<- na.omit(as.data.frame(ttcrmean, stringsAsFactors = FALSE))

then what you tried here will work again:

df$Time<- as.numeric(df$Time)

plus be slightly more computationally efficient (although you are not
dealing with that much data so it is probably not a big deal).  Below
is an example of the manual conversion I mentioned.  It only takes
three lines of code, the data should be numeric, and your column is
named "Log10", so its basically equivalent to what you had, but the
logic behind the code is a little less straightforward, which could
hurt readability in the future.

###########################
ttcrmean<- as.table(by(ngbe[,'Log10'],
list(Time=ngbe$Time,Temp=ngbe$Temp,Conc=ngbe$Conc,Repl=ngbe$Replicate),
  mean))
for (k in 1:3) {  #fix-up time zeroes
  for (l in 1:5) { #replicates
    t0val<- ttcrmean[1,3,k,l]
    for (j in 1:4) {  #temps
      ttcrmean[1,j,k,l]<- t0val
    } #j
  } #l
} #i

## Convert dimnames of your table that you want
## to be numeric to numeric and skip over Conc
xn <- dimnames(ttcrmean)
xn[c(1, 2, 4)] <- lapply(xn[c(1, 2, 4)], as.numeric)

## convert the table to a data frame manually
df <- na.omit(data.frame(expand.grid(xn), Log10 = c(ttcrmean)))
######################

Cheers,

Josh
On Sat, Jun 4, 2011 at 10:22 PM, Robert A LaBudde <ral at lcfltd.com> wrote: