An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20061229/1ee479fd/attachment.pl
coded to categorical variables in a large dataset
5 messages · sj, Chuck Cleland, jim holtman +2 more
sj wrote:
I am working with a dataset where there are 5 possible outcomes (coded 1:5), I would like to create 5 categorical variables (event1...event5). I am using a for loop an if statements, but I have a large dataset( approx 100,000 rows) it takes quite a bit of time, is there a way to speed this up? Here is some sample code of what I am currently doing.
Here is one way you might do it: X <- sample(1:5, 100, replace=TRUE) # Your 5 event variables in a matrix model.matrix(lm(rnorm(length(X)) ~ as.factor(X) - 1)) Also, along the lines of your approach below, the following using ifelse() might be better: event3 <- ifelse(test2 == 3, 1, 0) I'm sure other people will post different solutions probably more elegant than these.
test2 <-rep(seq(1:5),2000)
event1 <- rep(0,nrow(test2))
event2 <- rep(0,nrow(test2))
event3 <- rep(0,nrow(test2))
event4 <- rep(0,nrow(test2))
event5 <- rep(0,nrow(test2))
for(i in 1:length(event1))
{
if (test2[i]==1)
{
event1[i]=1
}
if (test2[i]==2)
{
event2[i]=1
}
if (test2[i]==3)
{
event3[i]=1
}
if (test2[i]==4)
{
event4[i]=1
}
if (test2[i]==5)
{
event5[i]=1
}
}
thanks,
Spencer
[[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20061229/b476d7c2/attachment.pl
As Richard has already pointed out you may only need to convert your
numeric vector to a factor but just in case here are a few direct answers:
Using X from Chuck's post here are two ways of creating a 100x5
matrix of indicator variables:
model.matrix(~ X-1, list(X = factor(X)))
outer(X, 1:5, "==")+0
# To create eventi variables
# here is a way of creating them
event1 <- (X == 1) + 0 # and similarly for 2, 3, 4, 5
# or do it in a loop
for(i in 1:5) assign(paste("event", i, sep = ""), (X == i) + 0)
# or create as columns of a data frame
f <- function(i, j) (X == j) + 0
as.data.frame(mapply(f, paste("event", 1:5, sep = ""), 1:5))
On 12/29/06, sj <ssj1364 at gmail.com> wrote:
I am working with a dataset where there are 5 possible outcomes (coded 1:5),
I would like to create 5 categorical variables (event1...event5). I am using
a for loop an if statements, but I have a large dataset( approx 100,000
rows) it takes quite a bit of time, is there a way to speed this up? Here is
some sample code of what I am currently doing.
test2 <-rep(seq(1:5),2000)
event1 <- rep(0,nrow(test2))
event2 <- rep(0,nrow(test2))
event3 <- rep(0,nrow(test2))
event4 <- rep(0,nrow(test2))
event5 <- rep(0,nrow(test2))
for(i in 1:length(event1))
{
if (test2[i]==1)
{
event1[i]=1
}
if (test2[i]==2)
{
event2[i]=1
}
if (test2[i]==3)
{
event3[i]=1
}
if (test2[i]==4)
{
event4[i]=1
}
if (test2[i]==5)
{
event5[i]=1
}
}
thanks,
Spencer
[[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Fri, 29 Dec 2006, sj wrote:
I am working with a dataset where there are 5 possible outcomes (coded 1:5), I would like to create 5 categorical variables (event1...event5). I am using a for loop an if statements, but I have a large dataset( approx 100,000 rows) it takes quite a bit of time, is there a way to speed this up? Here is some sample code of what I am currently doing. test2 <-rep(seq(1:5),2000)
[...]
As Richard suggested you may not want to do this at all, but ...
If you want these as a matrix, this is fast and direct:
mat <- diag(5)[ test2, ]
If not as a matrix
event1 <- as.numeric( test2 == 1 )
is concise and
for (i in 1:5) assign(paste("event",i,sep=""), as.numeric( test2==i ))
is about as fast as you can get.
HTH,
Chuck
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717