I would like to specify that certain columns in a data frame should be
treated as ordered factors. I know what numbers these columns are, but
not their names.
How do I do this?
For example, if I know columns 1:4 are to be treated as factors, I can
write
dat <- matrix(c(2,1,1,1, 1,1,1,1), 2, 4)
D <- as.data.frame(dat)
# force all variables to be treated as binary
# regardless of the small data set
D$V1 <- factor(D$V1, 1:2)
D$V2 <- factor(D$V2, 1:2)
D$V3 <- factor(D$V3, 1:2)
D$V4 <- factor(D$V4, 1:2)
But how do I do this in general? What I would like to say is something
like
for (i in my.factor.columns) {
D$Vi <- factor(D$Vi, 1:my.nlevels[i])
}
Presumably I could do something tricky using eval, but I don't know how.
Besides, I'd prefer to avoid eval, since it is slow.
(Also, I don't want to rely on the fact that the columns are named "Vnn"
by default.)
Kevin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
variable number of variables in data frames
4 messages · Kevin Murphy, Henrik Bengtsson, Laurent Gautier +1 more
Kevin, you can treat data frames as matrices, i.e. you can do
dat <- matrix(c(2,1,1,1, 1,1,1,1), 2, 4)
D <- as.data.frame(dat)
for (i in 1:ncol(D))
D[,i] <- factor(D[,i], 1:2)
Henrik
-----Original Message-----
From: murphyk at relay.eecs.berkeley.edu
[mailto:murphyk at relay.eecs.berkeley.edu]On Behalf Of Kevin Murphy
Sent: Monday, July 16, 2001 6:19 PM
To: r-help at hypatia.math.ethz.ch; Henrik Bengtsson
Cc: murphyk at cs.berkeley.edu
Subject: variable number of variables in data frames
I would like to specify that certain columns in a data frame should be
treated as ordered factors. I know what numbers these columns are, but
not their names.
How do I do this?
For example, if I know columns 1:4 are to be treated as factors, I can
write
dat <- matrix(c(2,1,1,1, 1,1,1,1), 2, 4)
D <- as.data.frame(dat)
# force all variables to be treated as binary
# regardless of the small data set
D$V1 <- factor(D$V1, 1:2)
D$V2 <- factor(D$V2, 1:2)
D$V3 <- factor(D$V3, 1:2)
D$V4 <- factor(D$V4, 1:2)
But how do I do this in general? What I would like to say is something
like
for (i in my.factor.columns) {
D$Vi <- factor(D$Vi, 1:my.nlevels[i])
}
Presumably I could do something tricky using eval, but I don't know how.
Besides, I'd prefer to avoid eval, since it is slow.
(Also, I don't want to rely on the fact that the columns are named "Vnn"
by default.)
Kevin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Kevin Murphy wrote:
I would like to specify that certain columns in a data frame should be
treated as ordered factors. I know what numbers these columns are, but
not their names.
How do I do this?
For example, if I know columns 1:4 are to be treated as factors, I can
write
dat <- matrix(c(2,1,1,1, 1,1,1,1), 2, 4)
D <- as.data.frame(dat)
# force all variables to be treated as binary
# regardless of the small data set
D$V1 <- factor(D$V1, 1:2)
D$V2 <- factor(D$V2, 1:2)
D$V3 <- factor(D$V3, 1:2)
D$V4 <- factor(D$V4, 1:2)
But how do I do this in general? What I would like to say is something
like
for (i in my.factor.columns) {
D$Vi <- factor(D$Vi, 1:my.nlevels[i])
}
Presumably I could do something tricky using eval, but I don't know how.
Besides, I'd prefer to avoid eval, since it is slow.
(Also, I don't want to rely on the fact that the columns are named "Vnn"
by default.)
Kevin
What about:
dat <- matrix(c(2,1,1,1, 1,1,1,1), 2, 4)
D <- as.data.frame(dat)
makefactor <- function(d,index) { d[index] <- sapply(d[index],factor); return(d)
}
# say one wants the two first columns to be factors index <- c(1,2) str(makefactor(D,index))
Regards, Laurent
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-- Laurent Gautier CBS, Building 208, DTU PhD. Student D-2800 Lyngby,Denmark tel: +45 45 25 24 85 http://www.cbs.dtu.dk/laurent -------------- next part -------------- An HTML attachment was scrubbed... URL: https://stat.ethz.ch/pipermail/r-help/attachments/20010717/e8254a3d/attachment.html
"Henrik Bengtsson" <henrikb at braju.com> writes:
Kevin, you can treat data frames as matrices, i.e. you can do
dat <- matrix(c(2,1,1,1, 1,1,1,1), 2, 4)
D <- as.data.frame(dat)
for (i in 1:ncol(D))
D[,i] <- factor(D[,i], 1:2)
..
for (i in my.factor.columns) {
D$Vi <- factor(D$Vi, 1:my.nlevels[i])
}
Also,
D[1:2]<-lapply(D[1:2],factor,levels=1:2) summary(D)
V1 V2 V3 V4
1:1 1:2 Min. :1 Min. :1
2:1 2:0 1st Qu.:1 1st Qu.:1
Median :1 Median :1
Mean :1 Mean :1
3rd Qu.:1 3rd Qu.:1
Max. :1 Max. :1
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._