Skip to content

length of variable in mlogit

10 messages · Ingmar Visser, Lee van Cleef, R. Michael Weylandt

#
Ingmar, many thanks. I get that one from R:

Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data =
data),  : 
  variable lengths differ (found for 'X')

X is the variable I have used. 

Any comment would be much appreciated.

Best regards!




--
View this message in context: http://r.789695.n4.nabble.com/length-of-variable-in-mlogit-tp4638323p4638667.html
Sent from the R help mailing list archive at Nabble.com.
#
Ingmar,

many thanks for your answer. 

I give you a smaller version of my program with the isolated "strange"
variable, which I used when trying to elaborate the problem .

[Start of R-Editor quote]
library(foreign)
library(gdata)
library(gtools)
library(gmodels)
library(gplots)
library(xtable)
library(mlogit)
library(survival)

#First I import the data, a survey in the wide format.
masterdata <- read.dta("E:/Masterdata.dta")

#Then I define the number of cases which is the dimension of rows of the
imported dataset.

cases <- dim(masterdata)[1]

# The conditional Logit-Model must have the form Y = aX, with Y as the
choice variable
# and X as the explaining one. 

# These are the headings for the explaining variable. 
Numbers <- c("2", "3", "4", "5", "6", "7")

# I define the matrix for the explaining variable. ncol corresponds to the
number of choices; cases is the number of people surveyed.
X <- matrix(0, ncol=6, nrow=cases)

# I define the choice variable. First, the answers become numeric values,
the I define the n.a.'s.
Y <- as.numeric(masterdata[,"V3D"]) 
Y[Y == 1 | Y ==8 | Y == 9 | Y == 10 | Y == 11] <- NA
Y <- as.factor(Y - 1)

#I import the answers from the survey in my matrix for the explaining
variable.
for (i in 1:5){X[,i][as.numeric(factor(masterdata$VS))==i] <- 1}
colnames (X) <- paste("X", Numbers, sep=".")
# I put my data set together and delete the NA options. 
masterdata.wide <- cbind (Y, X)
masterdata.wide <- na.omit(masterdata.wide)
head(masterdata.wide)

masterdata.long <- mlogit.data(masterdata.wide, varying = c(2:7), shape =
"wide", choice = "Y")

[End of R-Editor quote]

The first problem arises before the Condlog-order. R does not transform wide
into long and says: Error in data[[choice]] : subscript out of bounds

In the extended version, transforming from wide to long is no problem, no
idea why. But then, R says after Condlog: Error in
model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data),  :
variable lengths differ (found for 'X'). 

Other variables (alternative specific or with a generic regression
coefficient) apart from X work. I checked the variable lengths, but the
complete Logit-Model is working without problems for other variables with
the same lengths.

Which options do I have to address the problems with X? I did not find
anything via the Str-Option on first sight.

Any comment where and how I can look for would really be appreciated.

Best regards!
 







--
View this message in context: http://r.789695.n4.nabble.com/length-of-variable-in-mlogit-tp4638323p4639025.html
Sent from the R help mailing list archive at Nabble.com.
#
On Fri, Aug 3, 2012 at 4:27 AM, Lee van Cleef <l.van.cleef at gmx.net> wrote:
I'm afraid this line makes it hard for us to reproduce this otherwise
quite helpful example: can you give us dput(head(masterdata, 20))
which will give a plain text representation of the same? Don't worry
if you can't understand what shows up -- it's super helpful for us.

Best,
Michael
#
Hi Michael,

many thanks for your comment. 

Below the original data as imported from Stata format. I deleted some
columns with explaining variables because they were unnecessary and working;
the model has several explaining variables with generic regression
coefficients. 

pid	Choice	V7.Ch 1	V7.Ch 2	V7.Ch 3	V7.Ch 4	V7.Ch 5	V7.Ch 6	V4	VS
1	Choice 1	4	4	3	1	3	2	Choice 1	Choice 1
2	Choice 1	3	4	-3	-5	3	-5	Choice 1	Choice 1
3	n.a.	n.a.	n.a.	5	4	-5	-5	n.a.	0
4	Choice 2	-3	-4	1	2	-3	-2	Choice 2	Choice 2
5	Choice 1	3	3	2	-1	1	1	Choice 1	Choice 1
6	Choice 4	1	-3	-3	3	1	0	Choice 4	Choice 4
7	n.a.	-3	-3	0	-2	-5	-5	n.a.	0
8	n.a.	n.a.	n.a.	n.a.	n.a.	n.a.	n.a.	Choice 6	0
9	Choice 1	3	2	2	-1	2	3	Choice 3	0
10	n.a.	-5	-5	-5	-5	-5	-5	Choice 6	0
11	Choice 2	2	-2	4	2	-5	n.a.	Choice 2	Choice 2
12	n.a.	2	2	2	0	3	2	Choice 3	0
13	Choice 1	5	3	-5	-5	3	3	Choice 1	Choice 1
14	n.a.	1	1	-2	-2	3	3	n.a.	0
15	n.a.	1	-1	0	-1	2	3	Choice 3	0
16	Choice 3	3	2	2	1	3	3	n.a.	Choice 3
17	Choice 3	2	2	2	2	4	5	n.a.	0
18	Choice 1	5	3	0	0	2	2	Choice 1	Choice 1
19	Choice 2	-3	-4	0	2	-5	-5	Choice 4	Choice 2
20	Choice 4	3	2	2	1	1	0	Choice 4	Choice 5

V7 and the other variables with generic variables work perfect. So does V4
which can be used with both a generic and a alternative specific regression
coefficient. I only have problems with VS. There, R tells me that the length
of the variable causes problems (?Error in
model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data),  : 
variable lengths differ (found for 'P_id')). However, comparing lengths and
dimensions of V4 and VS, I see no difference! Both original data are
transformed into numeric values (as.numeric) (all values become to either
zero or one, depending on the alternative) and then put into (previously
defined) matrixes of the same dimensions. There were no differences visible
when checking the transformed variables with STR or CLASS. What else can I
check?

As stated above, I tried to isolate the problem-laden variable VS to set up
a single conditional logit model, but I did not manage it to make it  to the
long format (R comment was: Error in data[[choice]] : subscript out of
bounds). No idea whether there is a connection between the two problems. 

Any comment would be much appreciated.

Best regards and a nice weekend!




--
View this message in context: http://r.789695.n4.nabble.com/length-of-variable-in-mlogit-tp4638323p4639069.html
Sent from the R help mailing list archive at Nabble.com.
#
On Fri, Aug 3, 2012 at 11:49 AM, Lee van Cleef <l.van.cleef at gmx.net> wrote:
Hi Lee,

I apologize for being intransigent (well, no -- I actually don't) but
could you provide your data using dput() as asked?

Best,
Michael
6 days later
2 days later
#
Hi Lee,

I've finally had time to look at this:

If you look at

?mlogit.data

you'll see that choice must be "the variable indicating the choice
made: it can be either a logical vector, a numerical vector with 0
where the alternative is not chosen, a factor with level 'yes' when
the alternative is chosen."

For the data and script you provided me, we have a few problems:
firstly, masterdata.wide should be a data.frame, not a matrix -- this
can be rectified by wrapping it in as.data.frame. Once you do that,
take a look at the column "Y" which you supply for the choice
variable. It is a numerical vector, but it has no 0's so it doesn't
fit the requested input format.

For anyone else who wants to look at this:

dput(masterdata.wide)

structure(c(1, 2, 1, 4, 1, 5, 4, 1, 4, 3, 1, 2, 4, 0, 1, 0, 0,
0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(13L, 7L), .Dimnames = list(
    NULL, c("Y", "X.2", "X.3", "X.4", "X.5", "X.6", "X.7")), na.action
= structure(c(1,
3, 7, 8, 12, 14, 17), class = "omit"))

Hope that helps,
Michael
On Fri, Aug 10, 2012 at 2:31 AM, Lee van Cleef <l.van.cleef at gmx.net> wrote:
1 day later
#
Hi Michael,

many thanks for your time and the hint. This is much appreciated. 

Indeed, I managed it to get part of the dataset from which I want to develop
my final logit model in the correct long format with true/false standing in
the column for the choice  variable,.

But the problem with the variable I want to define with an alternative
specific coefficient (=P_id) still persists.

If doing the mlogit procedure and defining the table for this variable as 

1) a data frame I get the comment 

[Start of R quote]
Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data =
data),  : 
  invalid type (list) for variable 'P_id'
[End of R quote]

Btw, length of the variable/data frame P_id is 6, the length of the choice
variable is 1,406 and the length of a variable with a generic coefficient
(and which is working fine) is 8,436.

2) as a matrix, I get the comment 

[Start of R quote]
Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data =
data),  : 
  variable lengths differ (found for 'P_id')
[End of R quote]

Btw, length of the variable/matrix P_id is 8,436, the length of the choice
variable is 1,406 and the length of a variable with a generic coefficient
(and which is working fine) is 8,436.

Any comment or hint (also on literature, in case my mistakes are too simple
and obvious) would be helpful.

Best regards,

Lee



--
View this message in context: http://r.789695.n4.nabble.com/length-of-variable-in-mlogit-tp4638323p4640250.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi Michael,

many thanks for the useful hints which gave me some deeper knowledge of R.
It is definitely much appreciated.

I think I have found the mistake - the problems did not arise from variable
definitions etc. It was an intellectual mistake. 

The cause  for the problems was that I had to divide a variable into single
alternatives. The problematic variable referred to Party Identity. The whole
matrix with the single parties as columns was of course not suitable for a
variable with an alternative specific coefficient. Instead, each
column/party was a single individual specific variable with an alternative
specific coefficient.

Now it works.

Best wishes!



--
View this message in context: http://r.789695.n4.nabble.com/length-of-variable-in-mlogit-tp4638323p4640266.html
Sent from the R help mailing list archive at Nabble.com.