Skip to content

Multinomial Logit Model with lots of Dummy Variables

3 messages · ghpow1, Jeremy Hetzel

#
Hi All,

I am attempting to build a Multinomial Logit model with dummy variables of
the following form:

Dependent Variable : 0-8 Discrete Choices

Dummy Variable 1: 965 dummy varsghpow at student.monash.edu.augh@gp1.com
Dummy Variable 2: 805 dummy vars

The data set I am using has the dummy columns pre-created, so it's a table
of 72,381 rows and 1770 columns.

The first 965 columns represent the dummy columns for Variable 1
The next 805 columns represent the dummy columns for Variable 2

My code to build the mlogit model looks like the following. I want to
know...is there a better way of doing this without these huge equations? (I
probably also need a more powerful PC to do all of this).

I'll also want to perform a joint test of significance on the first 805
coefficients...

Is this possible?

Thanks

GP

[code]

#install MLOGIT
library(mlogit)

#load mydata
mydata = 0
mydata<-read.csv(file="G:\\data.csv",head=TRUE)
my_data=0

num.rows=length(mydata[,1])
num.cols=965+805+1


my_data=matrix(0,nr=num.rows,nc=num.cols)

for(i in 1:num.rows) {

	nb=mydata[i,2]
	np=mydata[i,3]

	my_data[i,nb]=1
	my_data[i,965+np]=1
	my_data[i,1+1770]=mydata[i,1]

	
}

#convert matrix to data.frame
# convert to data frame
my_data_frame<-as.data.frame(my_data)

#check data frame headers
head(my_data_frame)

#load dataframe into mldata with choice variable
mldata<-mlogit.data(my_data_frame, varying=NULL, choice="V1771",
shape="wide")

#V1771 = dependent var
#V1-V965 = variable 1 dummies
#V966-V1700 = variable 2 dummies

#regress V1771 against all 1700 variables...
mlogit.model<-mlogit(V1771~0|V1+V2+V3...+V1700,data=mldata, reflevel="0")


[/code]



--
View this message in context: http://r.789695.n4.nabble.com/Multinomial-Logit-Model-with-lots-of-Dummy-Variables-tp3439492p3439492.html
Sent from the R help mailing list archive at Nabble.com.
#
If you are just looking to collapse the dummy variables into two factor 
variables, the following will work.

## Generate some example data
set.seed(1234)
n <- 100
# Generate outcome
outcome <- rbinom(n, 3, 0.5)
colnames(exposures) <- paste("V", seq(1:10), sep = "")

#Generate dummy variables for A and B
A <- t(apply(matrix(nrow = 100, ncol = 5), 1, function(x)
{
sample(c(1, 0, 0, 0, 0))
}))
B <- t(apply(matrix(nrow = 100, ncol = 5), 1, function(x)
{
sample(c(1, 0, 0, 0, 0))
}))

# Combine into data frame
dat <- data.frame(outcome, A, B)
names(dat) <- c('outcome', paste("A", seq(1:5), sep = ""), paste("B", 
seq(1:5), sep = ""))
head(dat)


## Collapse dummies to factor variable
A <- apply(dat, 1, function(x)
{
A <- x[2:6]
A.names <- names(x[2:6])
A.value <- A.names[A==1]
return(A.value)
})

B <- apply(dat, 1, function(x)
{
B <- x[7:11]
B.names <- names(x[7:11])
B.names
B.value <- B.names[B==1]
return(B.value)
})

# Combine into new data frame
dat.new <- data.frame(dat$outcome, A, B)

head(dat.new)



Jeremy
6 days later
#
Hi 

Thanks to Jeremy for his response...

I have been able to generate the factors and generate mlogit data using his
code:

mldata<-mlogit.data(mydata, varying=NULL, choice="pitch_type_1",
shape="wide")

my mlogit data looks like:

"dependent_var","A variable","B Var","chid","alt"
FALSE,"110","19",1,"0"
FALSE,"110","19",1,"1"
FALSE,"110","19",1,"2"
FALSE,"110","19",1,"3"
FALSE,"110","19",1,"4"
TRUE,"110","19",1,"5"
FALSE,"110","19",1,"6"
FALSE,"110","19",1,"7"
FALSE,"110","19",1,"8"
FALSE,"110","19",2,"0"
FALSE,"110","19",2,"1"
FALSE,"110","19",2,"2"
FALSE,"110","19",2,"3"
FALSE,"110","19",2,"4"
FALSE,"110","19",2,"5"
TRUE,"110","19",2,"6"
FALSE,"110","19",2,"7"
FALSE,"110","19",2,"8"
TRUE,"110","561",3,"0"
FALSE,"110","561",3,"1"
FALSE,"110","561",3,"2"
FALSE,"110","561",3,"3"
FALSE,"110","561",3,"4"
FALSE,"110","561",3,"5"
FALSE,"110","561",3,"6"
FALSE,"110","561",3,"7"
FALSE,"110","561",3,"8"
FALSE,"110","149",4,"0"
FALSE,"110","149",4,"1"
TRUE,"110","149",4,"2"

...

The mldata contains 651431 rows.  

If I try to run this full data set I get the following error:
Error in model.matrix.default(formula, data) :
  allocMatrix: too many elements specified
Calls: mlogit ... model.matrix.mFormula -> model.matrix ->
model.matrix.default
Execution halted

Smaller datasets (595 mldata rows) and mlogit works fine and generates
regression output.  

Is there a problem with mlogit and huge datasets?  

I suppose this is perhaps not the best way to assess this kind of data, but
I am trying to replicate a previous analysis that was completed on a similar
amount of similar data.




--
View this message in context: http://r.789695.n4.nabble.com/Multinomial-Logit-Model-with-lots-of-Dummy-Variables-tp3439492p3455345.html
Sent from the R help mailing list archive at Nabble.com.