Dear expeRts,
I have two questions concerning data frames:
(1) How can I apply the class function to each component in a data.frame? As you can see below, applying class to each column is not the right approach; applying it to each component seems bulky.
(2) After transforming the data frame a bit, the classes of certain components change to factor. How can I remove the factor structure?
Cheers,
Marius
x <- c(2004:2010, 2002:2011, 2000:2011)
df <- data.frame(x=x, group=c(rep("low",7), rep("middle",10), rep("high",12)),
y=x+100*runif(length(x)))
## Question (1): why do the following lines do not give the same "class"?
apply(df, 2, class)
class(df$x)
class(df$group)
class(df$y)
df. <- as.data.frame(xtabs(y ~ x + group, data=df))
class(df.$x)
class(df.$group)
class(df.$Freq)
## Question (2): how can I remove the factor structure from x?
df.$x <- as.numeric(as.character(df.$x)) # seems bulky; note that as.numeric(df.$x) is not correct
class(df.$x)
data.frame: How to get the classes of all components and how to remove their factor structure?
4 messages · Marius Hofert, PIKAL Petr
Hi
Dear expeRts, I have two questions concerning data frames: (1) How can I apply the class function to each component in a
data.frame?
As you can see below, applying class to each column is not the right
approach; applying it to each component seems bulky.
(2) After transforming the data frame a bit, the classes of certain
components change to factor. How can I remove the factor structure?
Cheers,
Marius
x <- c(2004:2010, 2002:2011, 2000:2011)
df <- data.frame(x=x, group=c(rep("low",7), rep("middle",10),
rep("high",12)),
y=x+100*runif(length(x))) ## Question (1): why do the following lines do not give the same
"class"? from help page ?apply Arguments X an array, including a matrix. array is not a data frame
apply(df, 2, class) class(df$x) class(df$group) class(df$y)
sapply(df, class)
x group y
"integer" "factor" "numeric"
df. <- as.data.frame(xtabs(y ~ x + group, data=df)) class(df.$x) class(df.$group) class(df.$Freq) ## Question (2): how can I remove the factor structure from x? df.$x <- as.numeric(as.character(df.$x)) # seems bulky; note that as.numeric(df.$x) is not correct
Actually it is correct in a sense it behaves as documented ?factor Warning The interpretation of a factor depends on both the codes and the "levels" attribute. Be careful only to compare factors with the same set of levels (in the same order). In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)). Regards Petr
class(df.$x)
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Dear expeRts, I have two questions concerning data frames: (1) How can I apply the class function to each component in a
data.frame?
As you can see below, applying class to each column is not the right
approach; applying it to each component seems bulky.
(2) After transforming the data frame a bit, the classes of certain
components change to factor. How can I remove the factor structure?
Cheers,
Marius
x <- c(2004:2010, 2002:2011, 2000:2011)
df <- data.frame(x=x, group=c(rep("low",7), rep("middle",10),
rep("high",12)),
y=x+100*runif(length(x))) ## Question (1): why do the following lines do not give the same
"class"?
apply(df, 2, class) class(df$x) class(df$group) class(df$y) df. <- as.data.frame(xtabs(y ~ x + group, data=df)) class(df.$x) class(df.$group) class(df.$Freq) ## Question (2): how can I remove the factor structure from x? df.$x <- as.numeric(as.character(df.$x)) # seems bulky; note that
If you do it often you can unfactor <- function(x) as.numeric(as.character(x)) df.$x <- unfactor(df.$x) or you can use df. <- as.data.frame(xtabs(y ~ x + group, data=df), stringsAsFactors=FALSE) df.$x <- as.numeric(df.$x) But it seems to me that it is not much less bulkier. Regards Petr
as.numeric(df.$x) is not correct class(df.$x)
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Dear Petr, thanks for your posts, they perfectly answered my questions. Cheers, Marius
On 2011-06-28, at 09:49 , Petr PIKAL wrote:
Dear expeRts, I have two questions concerning data frames: (1) How can I apply the class function to each component in a
data.frame?
As you can see below, applying class to each column is not the right
approach; applying it to each component seems bulky.
(2) After transforming the data frame a bit, the classes of certain
components change to factor. How can I remove the factor structure?
Cheers,
Marius
x <- c(2004:2010, 2002:2011, 2000:2011)
df <- data.frame(x=x, group=c(rep("low",7), rep("middle",10),
rep("high",12)),
y=x+100*runif(length(x))) ## Question (1): why do the following lines do not give the same
"class"?
apply(df, 2, class) class(df$x) class(df$group) class(df$y) df. <- as.data.frame(xtabs(y ~ x + group, data=df)) class(df.$x) class(df.$group) class(df.$Freq) ## Question (2): how can I remove the factor structure from x? df.$x <- as.numeric(as.character(df.$x)) # seems bulky; note that
If you do it often you can unfactor <- function(x) as.numeric(as.character(x)) df.$x <- unfactor(df.$x) or you can use df. <- as.data.frame(xtabs(y ~ x + group, data=df), stringsAsFactors=FALSE) df.$x <- as.numeric(df.$x) But it seems to me that it is not much less bulkier. Regards Petr
as.numeric(df.$x) is not correct class(df.$x)
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.