Skip to content

coded to categorical variables in a large dataset

5 messages · sj, Chuck Cleland, jim holtman +2 more

#
sj wrote:
Here is one way you might do it:

X <- sample(1:5, 100, replace=TRUE)

# Your 5 event variables in a matrix
model.matrix(lm(rnorm(length(X)) ~ as.factor(X) - 1))

  Also, along the lines of your approach below, the following using
ifelse() might be better:

event3 <- ifelse(test2 == 3, 1, 0)

  I'm sure other people will post different solutions probably more
elegant than these.

  
    
#
As Richard has already pointed out you may only need to convert your
numeric vector to a factor but just in case here are a few direct answers:


Using X from Chuck's post here are two ways of creating a 100x5
matrix of indicator variables:

model.matrix(~ X-1, list(X = factor(X)))
outer(X, 1:5, "==")+0

# To create eventi variables
# here is a way of creating them

event1 <- (X == 1) + 0 # and similarly for 2, 3, 4, 5

# or do it in a loop
for(i in 1:5) assign(paste("event", i, sep = ""), (X == i) + 0)

# or create as columns of a data frame
f <- function(i, j) (X == j) + 0
as.data.frame(mapply(f, paste("event", 1:5, sep = ""), 1:5))
On 12/29/06, sj <ssj1364 at gmail.com> wrote:
#
On Fri, 29 Dec 2006, sj wrote:

            
[...]

As Richard suggested you may not want to do this at all, but ...

If you want these as a matrix, this is fast and direct:

 	mat <- diag(5)[ test2, ]

If not as a matrix

 	event1 <- as.numeric( test2 == 1 )

is concise and

 	for (i in 1:5) assign(paste("event",i,sep=""), as.numeric( test2==i ))

is about as fast as you can get.

HTH,

Chuck


Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717