I have a data frame: > head(df) Time Temp Conc Repl Log10 1 0 -20 H 1 6.406547 2 2 -20 H 1 5.738683 3 7 -20 H 1 5.796394 4 14 -20 H 1 4.413691 5 0 4 H 1 6.406547 7 7 4 H 1 5.705433 > str(df) 'data.frame': 177 obs. of 5 variables: $ Time : Factor w/ 4 levels "0","2","7","14": 1 2 3 4 1 3 4 1 3 4 ... $ Temp : Factor w/ 4 levels "-20","4","25",..: 1 1 1 1 2 2 2 3 3 3 ... $ Conc : Factor w/ 3 levels "H","L","M": 1 1 1 1 1 1 1 1 1 1 ... $ Repl : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ... $ Log10: num 6.41 5.74 5.8 4.41 6.41 ... > levels(df$Temp) [1] "-20" "4" "25" "45" > levels(df$Time) [1] "0" "2" "7" "14" As you can see, "Time" and "Temp" are currently factors, not numeric. I would like to change these columns into numerical ones. df$Time<- as.numeric(df$Time) doesn't work, as it changes to the factor level indices (1,2,3,4) instead of the values (0,2,7,14). There must be a direct way of doing this in R. I tried recode() in 'car': > df$Temp<- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) > head(df) Time Temp Conc Repl Freq 1 0 -20 H 1 6.406547 2 2 -20 H 1 5.738683 3 7 -20 H 1 5.796394 4 14 -20 H 1 4.413691 5 0 45 H 1 6.406547 7 7 45 H 1 5.705433 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as expected, although the result is numeric. The same happens if I use the order given by levels(df$Temp) instead of the sort order in the recode() 2nd argument. Any hints? ================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire"
How to convert a factor column into a numeric one?
8 messages · Robert A. LaBudde, Jorge Ivan Velez, Dennis Murphy +2 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110605/4b89bb0a/attachment.pl>
Hi: Try this:
dd <- data.frame(a = factor(rep(1:5, each = 4)),
+ b = factor(rep(rep(1:2, each = 2), 5)), + y = rnorm(20))
str(dd)
'data.frame': 20 obs. of 3 variables: $ a: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ... $ b: Factor w/ 2 levels "1","2": 1 1 2 2 1 1 2 2 1 1 ... $ y: num 0.6396 1.467 1.8403 -0.0915 0.2711 ...
de <- within(dd, {
+ a <- as.numeric(as.character(a)) + b <- as.numeric(as.character(b)) + } )
str(de)
'data.frame': 20 obs. of 3 variables: $ a: num 1 1 1 1 2 2 2 2 3 3 ... $ b: num 1 1 2 2 1 1 2 2 1 1 ... $ y: num 0.6396 1.467 1.8403 -0.0915 0.2711 ... HTH, Dennis
On Sat, Jun 4, 2011 at 9:31 PM, Robert A. LaBudde <ral at lcfltd.com> wrote:
I have a data frame:
head(df)
?Time Temp Conc Repl ? ?Log10 1 ? ?0 ?-20 ? ?H ? ?1 6.406547 2 ? ?2 ?-20 ? ?H ? ?1 5.738683 3 ? ?7 ?-20 ? ?H ? ?1 5.796394 4 ? 14 ?-20 ? ?H ? ?1 4.413691 5 ? ?0 ? ?4 ? ?H ? ?1 6.406547 7 ? ?7 ? ?4 ? ?H ? ?1 5.705433
str(df)
'data.frame': ? 177 obs. of ?5 variables: ?$ Time : Factor w/ 4 levels "0","2","7","14": 1 2 3 4 1 3 4 1 3 4 ... ?$ Temp : Factor w/ 4 levels "-20","4","25",..: 1 1 1 1 2 2 2 3 3 3 ... ?$ Conc : Factor w/ 3 levels "H","L","M": 1 1 1 1 1 1 1 1 1 1 ... ?$ Repl : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ... ?$ Log10: num ?6.41 5.74 5.8 4.41 6.41 ...
levels(df$Temp)
[1] "-20" "4" ? "25" ?"45"
levels(df$Time)
[1] "0" ?"2" ?"7" ?"14" As you can see, "Time" and "Temp" are currently factors, not numeric. I would like to change these columns into numerical ones. df$Time<- as.numeric(df$Time) doesn't work, as it changes to the factor level indices (1,2,3,4) instead of the values (0,2,7,14). There must be a direct way of doing this in R. I tried recode() in 'car':
df$Temp<- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) head(df)
?Time Temp Conc Repl ? ? Freq 1 ? ?0 ?-20 ? ?H ? ?1 6.406547 2 ? ?2 ?-20 ? ?H ? ?1 5.738683 3 ? ?7 ?-20 ? ?H ? ?1 5.796394 4 ? 14 ?-20 ? ?H ? ?1 4.413691 5 ? ?0 ? 45 ? ?H ? ?1 6.406547 7 ? ?7 ? 45 ? ?H ? ?1 5.705433 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as expected, although the result is numeric. The same happens if I use the order given by levels(df$Temp) instead of the sort order in the recode() 2nd argument. Any hints? ================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS ?e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. ? ? ? ? ? ?URL: http://lcfltd.com/ 824 Timberlake Drive ? ? ? ? ? ? ? ? ? ? Tel: 757-467-0954 Virginia Beach, VA 23464-3239 ? ? ? ? ? ?Fax: 757-467-2947 "Vere scire est per causas scire"
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Robert,
Try this:
## Example data converting mtcars to factors
testdf <- as.data.frame(lapply(mtcars, factor))
str(testdf)
## taking advantage of assignment methods to avoid an explicit call to
as.data.frame
## convert factor to numeric using the technique recommended in ?factor
testdf[] <- lapply(testdf, function(x)
as.numeric(levels(x))[x])
str(testdf)
If you do not want to convert all columns, just use a subset. Here is one way:
testdf[, c("mpg", "cyl", "disp")] <-
lapply(testdf[, c("mpg", "cyl", "disp")],
function(x) as.numeric(levels(x))[x])
I would also look into *why* those numeric columns are being stored as
factors in the first place. If you are reading the data in with
read.table() or one of its wrapper functions (like read.csv), then it
would be better to preempt the storage as a factor altogether rather
than converting back to numeric. For example, perhaps something is
being used to indicate missing data that R does not recognize (e.g.,
SAS uses "."). Specifying na.strings = ".", would fix this. See
?read.table for some of the options available.
Hope this helps,
Josh
On Sat, Jun 4, 2011 at 9:31 PM, Robert A. LaBudde <ral at lcfltd.com> wrote:
I have a data frame:
head(df)
?Time Temp Conc Repl ? ?Log10 1 ? ?0 ?-20 ? ?H ? ?1 6.406547 2 ? ?2 ?-20 ? ?H ? ?1 5.738683 3 ? ?7 ?-20 ? ?H ? ?1 5.796394 4 ? 14 ?-20 ? ?H ? ?1 4.413691 5 ? ?0 ? ?4 ? ?H ? ?1 6.406547 7 ? ?7 ? ?4 ? ?H ? ?1 5.705433
str(df)
'data.frame': ? 177 obs. of ?5 variables: ?$ Time : Factor w/ 4 levels "0","2","7","14": 1 2 3 4 1 3 4 1 3 4 ... ?$ Temp : Factor w/ 4 levels "-20","4","25",..: 1 1 1 1 2 2 2 3 3 3 ... ?$ Conc : Factor w/ 3 levels "H","L","M": 1 1 1 1 1 1 1 1 1 1 ... ?$ Repl : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ... ?$ Log10: num ?6.41 5.74 5.8 4.41 6.41 ...
levels(df$Temp)
[1] "-20" "4" ? "25" ?"45"
levels(df$Time)
[1] "0" ?"2" ?"7" ?"14" As you can see, "Time" and "Temp" are currently factors, not numeric. I would like to change these columns into numerical ones. df$Time<- as.numeric(df$Time) doesn't work, as it changes to the factor level indices (1,2,3,4) instead of the values (0,2,7,14). There must be a direct way of doing this in R. I tried recode() in 'car':
df$Temp<- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) head(df)
?Time Temp Conc Repl ? ? Freq 1 ? ?0 ?-20 ? ?H ? ?1 6.406547 2 ? ?2 ?-20 ? ?H ? ?1 5.738683 3 ? ?7 ?-20 ? ?H ? ?1 5.796394 4 ? 14 ?-20 ? ?H ? ?1 4.413691 5 ? ?0 ? 45 ? ?H ? ?1 6.406547 7 ? ?7 ? 45 ? ?H ? ?1 5.705433 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as expected, although the result is numeric. The same happens if I use the order given by levels(df$Temp) instead of the sort order in the recode() 2nd argument. Any hints? ================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS ?e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. ? ? ? ? ? ?URL: http://lcfltd.com/ 824 Timberlake Drive ? ? ? ? ? ? ? ? ? ? Tel: 757-467-0954 Virginia Beach, VA 23464-3239 ? ? ? ? ? ?Fax: 757-467-2947 "Vere scire est per causas scire"
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Exactly! Thanks.
At 12:49 AM 6/5/2011, Jorge Ivan Velez wrote:
Dr. LaBudde, Perhaps as.numeric(as.character(x)) is what you are looking for. HTH, Jorge On Sun, Jun 5, 2011 at 12:31 AM, Robert A. LaBudde <> wrote: I have a data frame:
head(df)
Time Temp Conc Repl Log10 1 0 -20 H 1 6.406547 2 2 -20 H 1 5.738683 3 7 -20 H 1 5.796394 4 14 -20 H 1 4.413691 5 0 4 H 1 6.406547 7 7 4 H 1 5.705433
str(df)
'data.frame': 177 obs. of 5 variables: $ Time : Factor w/ 4 levels "0","2","7","14": 1 2 3 4 1 3 4 1 3 4 ... $ Temp : Factor w/ 4 levels "-20","4","25",..: 1 1 1 1 2 2 2 3 3 3 ... $ Conc : Factor w/ 3 levels "H","L","M": 1 1 1 1 1 1 1 1 1 1 ... $ Repl : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ... $ Log10: num 6.41 5.74 5.8 4.41 6.41 ...
levels(df$Temp)
[1] "-20" "4" "25" "45"
levels(df$Time)
[1] "0" "2" "7" "14" As you can see, "Time" and "Temp" are currently factors, not numeric. I would like to change these columns into numerical ones. df$Time<- as.numeric(df$Time) doesn't work, as it changes to the factor level indices (1,2,3,4) instead of the values (0,2,7,14). There must be a direct way of doing this in R. I tried recode() in 'car':
df$Temp<- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) head(df)
Time Temp Conc Repl Freq 1 0 -20 H 1 6.406547 2 2 -20 H 1 5.738683 3 7 -20 H 1 5.796394 4 14 -20 H 1 4.413691 5 0 45 H 1 6.406547 7 7 45 H 1 5.705433 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as expected, although the result is numeric. The same happens if I use the order given by levels(df$Temp) instead of the sort order in the recode() 2nd argument. Any hints? ================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: <mailto:ral at lcfltd.com>ral at lcfltd.com Least Cost Formulations, Ltd. URL: <http://lcfltd.com/>http://lcfltd.com/ 824 Timberlake Drive Tel: <tel:757-467-0954>757-467-0954 Virginia Beach, VA 23464-3239 Fax: <tel:757-467-2947>757-467-2947 "Vere scire est per causas scire"
______________________________________________ <mailto:R-help at r-project.org>R-help at r-project.org mailing list <https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide <http://www.R-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire"
Thanks for your help.
As far as your question below is concerned, the data frame arose as a
result of some data cleaning on an original data frame, which was
changed into a table, modified, and changed back to a data frame:
ttcrmean<- as.table(by(ngbe[,'Log10'],
list(Time=ngbe$Time,Temp=ngbe$Temp,Conc=ngbe$Conc,Repl=ngbe$Replicate),
mean))
for (k in 1:3) { #fix-up time zeroes
for (l in 1:5) { #replicates
t0val<- ttcrmean[1,3,k,l]
for (j in 1:4) { #temps
ttcrmean[1,j,k,l]<- t0val
} #j
} #l
} #i
df<- na.omit(as.data.frame(ttcrmean))
colnames(df)[5]<- 'Log10'
At 12:51 AM 6/5/2011, Joshua Wiley wrote:
Hi Robert, <snip> I would also look into *why* those numeric columns are being stored as factors in the first place. If you are reading the data in with read.table() or one of its wrapper functions (like read.csv), then it would be better to preempt the storage as a factor altogether rather than converting back to numeric. For example, perhaps something is being used to indicate missing data that R does not recognize (e.g., SAS uses "."). Specifying na.strings = ".", would fix this. See ?read.table for some of the options available. <snip>
================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire"
Thanks! Exactly what I wanted, as the same as Jorge also suggested.
At 12:49 AM 6/5/2011, Dennis Murphy wrote:
Hi: Try this:
dd <- data.frame(a = factor(rep(1:5, each = 4)),
+ b = factor(rep(rep(1:2, each = 2), 5)), + y = rnorm(20))
str(dd)
'data.frame': 20 obs. of 3 variables: $ a: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ... $ b: Factor w/ 2 levels "1","2": 1 1 2 2 1 1 2 2 1 1 ... $ y: num 0.6396 1.467 1.8403 -0.0915 0.2711 ...
de <- within(dd, {
+ a <- as.numeric(as.character(a)) + b <- as.numeric(as.character(b)) + } )
str(de)
'data.frame': 20 obs. of 3 variables: $ a: num 1 1 1 1 2 2 2 2 3 3 ... $ b: num 1 1 2 2 1 1 2 2 1 1 ... $ y: num 0.6396 1.467 1.8403 -0.0915 0.2711 ... HTH, Dennis On Sat, Jun 4, 2011 at 9:31 PM, Robert A. LaBudde <ral at lcfltd.com> wrote:
I have a data frame:
head(df)
Time Temp Conc Repl Log10 1 0 -20 H 1 6.406547 2 2 -20 H 1 5.738683 3 7 -20 H 1 5.796394 4 14 -20 H 1 4.413691 5 0 4 H 1 6.406547 7 7 4 H 1 5.705433
str(df)
'data.frame': 177 obs. of 5 variables: $ Time : Factor w/ 4 levels "0","2","7","14": 1 2 3 4 1 3 4 1 3 4 ... $ Temp : Factor w/ 4 levels "-20","4","25",..: 1 1 1 1 2 2 2 3 3 3 ... $ Conc : Factor w/ 3 levels "H","L","M": 1 1 1 1 1 1 1 1 1 1 ... $ Repl : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ... $ Log10: num 6.41 5.74 5.8 4.41 6.41 ...
levels(df$Temp)
[1] "-20" "4" "25" "45"
levels(df$Time)
[1] "0" "2" "7" "14" As you can see, "Time" and "Temp" are currently factors, not numeric. I would like to change these columns into numerical ones. df$Time<- as.numeric(df$Time) doesn't work, as it changes to the factor level indices (1,2,3,4)
instead of
the values (0,2,7,14). There must be a direct way of doing this in R. I tried recode() in 'car':
df$Temp<- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) head(df)
Time Temp Conc Repl Freq 1 0 -20 H 1 6.406547 2 2 -20 H 1 5.738683 3 7 -20 H 1 5.796394 4 14 -20 H 1 4.413691 5 0 45 H 1 6.406547 7 7 45 H 1 5.705433 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as expected, although the result is numeric. The same happens if I use the order given by levels(df$Temp) instead of the sort order in the
recode() 2nd
argument. Any hints? ================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire"
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire"
Hmm, that is a bit tricky. The conversion from a table to a data
frame uses the dimension names, which are always character. To bypass
this, you would need to save the dimension names, convert the ones you
want numeric to numeric (I am assuming everything except Conc, so the
indices would be c(1, 2, 4)), and then manually convert from table to
data frame (but that is not too difficult).
In your case I am not sure there is a big benefit one way or the
other, but if you do it the way you have been and then convert the
data back to numeric, if you use:
df<- na.omit(as.data.frame(ttcrmean, stringsAsFactors = FALSE))
then what you tried here will work again:
df$Time<- as.numeric(df$Time)
plus be slightly more computationally efficient (although you are not
dealing with that much data so it is probably not a big deal). Below
is an example of the manual conversion I mentioned. It only takes
three lines of code, the data should be numeric, and your column is
named "Log10", so its basically equivalent to what you had, but the
logic behind the code is a little less straightforward, which could
hurt readability in the future.
###########################
ttcrmean<- as.table(by(ngbe[,'Log10'],
list(Time=ngbe$Time,Temp=ngbe$Temp,Conc=ngbe$Conc,Repl=ngbe$Replicate),
mean))
for (k in 1:3) { #fix-up time zeroes
for (l in 1:5) { #replicates
t0val<- ttcrmean[1,3,k,l]
for (j in 1:4) { #temps
ttcrmean[1,j,k,l]<- t0val
} #j
} #l
} #i
## Convert dimnames of your table that you want
## to be numeric to numeric and skip over Conc
xn <- dimnames(ttcrmean)
xn[c(1, 2, 4)] <- lapply(xn[c(1, 2, 4)], as.numeric)
## convert the table to a data frame manually
df <- na.omit(data.frame(expand.grid(xn), Log10 = c(ttcrmean)))
######################
Cheers,
Josh
On Sat, Jun 4, 2011 at 10:22 PM, Robert A LaBudde <ral at lcfltd.com> wrote:
Thanks for your help.
As far as your question below is concerned, the data frame arose as a result
of some data cleaning on an original data frame, which was changed into a
table, modified, and changed back to a data frame:
ttcrmean<- as.table(by(ngbe[,'Log10'],
list(Time=ngbe$Time,Temp=ngbe$Temp,Conc=ngbe$Conc,Repl=ngbe$Replicate),
?mean))
for (k in 1:3) { ?#fix-up time zeroes
?for (l in 1:5) { #replicates
? ?t0val<- ttcrmean[1,3,k,l]
? ?for (j in 1:4) { ?#temps
? ? ?ttcrmean[1,j,k,l]<- t0val
? ?} #j
?} #l
} #i
df<- na.omit(as.data.frame(ttcrmean))
colnames(df)[5]<- 'Log10'
At 12:51 AM 6/5/2011, Joshua Wiley wrote:
Hi Robert, <snip> I would also look into *why* those numeric columns are being stored as factors in the first place. ?If you are reading the data in with read.table() or one of its wrapper functions (like read.csv), then it would be better to preempt the storage as a factor altogether rather than converting back to numeric. ?For example, perhaps something is being used to indicate missing data that R does not recognize (e.g., SAS uses "."). ?Specifying na.strings = ".", would fix this. ?See ?read.table for some of the options available. <snip>
================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS ?e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. ? ? ? ? ? ?URL: http://lcfltd.com/ 824 Timberlake Drive ? ? ? ? ? ? ? ? ? ? Tel: 757-467-0954 Virginia Beach, VA 23464-3239 ? ? ? ? ? ?Fax: 757-467-2947 "Vere scire est per causas scire" ================================================================
Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/