Ingmar, many thanks. I get that one from R: Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data), : variable lengths differ (found for 'X') X is the variable I have used. Any comment would be much appreciated. Best regards! -- View this message in context: http://r.789695.n4.nabble.com/length-of-variable-in-mlogit-tp4638323p4638667.html Sent from the R help mailing list archive at Nabble.com.
length of variable in mlogit
10 messages · Ingmar Visser, Lee van Cleef, R. Michael Weylandt
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120802/3b688ab0/attachment.pl>
Ingmar,
many thanks for your answer.
I give you a smaller version of my program with the isolated "strange"
variable, which I used when trying to elaborate the problem .
[Start of R-Editor quote]
library(foreign)
library(gdata)
library(gtools)
library(gmodels)
library(gplots)
library(xtable)
library(mlogit)
library(survival)
#First I import the data, a survey in the wide format.
masterdata <- read.dta("E:/Masterdata.dta")
#Then I define the number of cases which is the dimension of rows of the
imported dataset.
cases <- dim(masterdata)[1]
# The conditional Logit-Model must have the form Y = aX, with Y as the
choice variable
# and X as the explaining one.
# These are the headings for the explaining variable.
Numbers <- c("2", "3", "4", "5", "6", "7")
# I define the matrix for the explaining variable. ncol corresponds to the
number of choices; cases is the number of people surveyed.
X <- matrix(0, ncol=6, nrow=cases)
# I define the choice variable. First, the answers become numeric values,
the I define the n.a.'s.
Y <- as.numeric(masterdata[,"V3D"])
Y[Y == 1 | Y ==8 | Y == 9 | Y == 10 | Y == 11] <- NA
Y <- as.factor(Y - 1)
#I import the answers from the survey in my matrix for the explaining
variable.
for (i in 1:5){X[,i][as.numeric(factor(masterdata$VS))==i] <- 1}
colnames (X) <- paste("X", Numbers, sep=".")
# I put my data set together and delete the NA options.
masterdata.wide <- cbind (Y, X)
masterdata.wide <- na.omit(masterdata.wide)
head(masterdata.wide)
masterdata.long <- mlogit.data(masterdata.wide, varying = c(2:7), shape =
"wide", choice = "Y")
[End of R-Editor quote]
The first problem arises before the Condlog-order. R does not transform wide
into long and says: Error in data[[choice]] : subscript out of bounds
In the extended version, transforming from wide to long is no problem, no
idea why. But then, R says after Condlog: Error in
model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data), :
variable lengths differ (found for 'X').
Other variables (alternative specific or with a generic regression
coefficient) apart from X work. I checked the variable lengths, but the
complete Logit-Model is working without problems for other variables with
the same lengths.
Which options do I have to address the problems with X? I did not find
anything via the Str-Option on first sight.
Any comment where and how I can look for would really be appreciated.
Best regards!
--
View this message in context: http://r.789695.n4.nabble.com/length-of-variable-in-mlogit-tp4638323p4639025.html
Sent from the R help mailing list archive at Nabble.com.
On Fri, Aug 3, 2012 at 4:27 AM, Lee van Cleef <l.van.cleef at gmx.net> wrote:
Ingmar,
many thanks for your answer.
I give you a smaller version of my program with the isolated "strange"
variable, which I used when trying to elaborate the problem .
[Start of R-Editor quote]
library(foreign)
library(gdata)
library(gtools)
library(gmodels)
library(gplots)
library(xtable)
library(mlogit)
library(survival)
#First I import the data, a survey in the wide format.
masterdata <- read.dta("E:/Masterdata.dta")
I'm afraid this line makes it hard for us to reproduce this otherwise quite helpful example: can you give us dput(head(masterdata, 20)) which will give a plain text representation of the same? Don't worry if you can't understand what shows up -- it's super helpful for us. Best, Michael
#Then I define the number of cases which is the dimension of rows of the
imported dataset.
cases <- dim(masterdata)[1]
# The conditional Logit-Model must have the form Y = aX, with Y as the
choice variable
# and X as the explaining one.
# These are the headings for the explaining variable.
Numbers <- c("2", "3", "4", "5", "6", "7")
# I define the matrix for the explaining variable. ncol corresponds to the
number of choices; cases is the number of people surveyed.
X <- matrix(0, ncol=6, nrow=cases)
# I define the choice variable. First, the answers become numeric values,
the I define the n.a.'s.
Y <- as.numeric(masterdata[,"V3D"])
Y[Y == 1 | Y ==8 | Y == 9 | Y == 10 | Y == 11] <- NA
Y <- as.factor(Y - 1)
#I import the answers from the survey in my matrix for the explaining
variable.
for (i in 1:5){X[,i][as.numeric(factor(masterdata$VS))==i] <- 1}
colnames (X) <- paste("X", Numbers, sep=".")
# I put my data set together and delete the NA options.
masterdata.wide <- cbind (Y, X)
masterdata.wide <- na.omit(masterdata.wide)
head(masterdata.wide)
masterdata.long <- mlogit.data(masterdata.wide, varying = c(2:7), shape =
"wide", choice = "Y")
[End of R-Editor quote]
The first problem arises before the Condlog-order. R does not transform wide
into long and says: Error in data[[choice]] : subscript out of bounds
In the extended version, transforming from wide to long is no problem, no
idea why. But then, R says after Condlog: Error in
model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data), :
variable lengths differ (found for 'X').
Other variables (alternative specific or with a generic regression
coefficient) apart from X work. I checked the variable lengths, but the
complete Logit-Model is working without problems for other variables with
the same lengths.
Which options do I have to address the problems with X? I did not find
anything via the Str-Option on first sight.
Any comment where and how I can look for would really be appreciated.
Best regards!
--
View this message in context: http://r.789695.n4.nabble.com/length-of-variable-in-mlogit-tp4638323p4639025.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Michael, many thanks for your comment. Below the original data as imported from Stata format. I deleted some columns with explaining variables because they were unnecessary and working; the model has several explaining variables with generic regression coefficients. pid Choice V7.Ch 1 V7.Ch 2 V7.Ch 3 V7.Ch 4 V7.Ch 5 V7.Ch 6 V4 VS 1 Choice 1 4 4 3 1 3 2 Choice 1 Choice 1 2 Choice 1 3 4 -3 -5 3 -5 Choice 1 Choice 1 3 n.a. n.a. n.a. 5 4 -5 -5 n.a. 0 4 Choice 2 -3 -4 1 2 -3 -2 Choice 2 Choice 2 5 Choice 1 3 3 2 -1 1 1 Choice 1 Choice 1 6 Choice 4 1 -3 -3 3 1 0 Choice 4 Choice 4 7 n.a. -3 -3 0 -2 -5 -5 n.a. 0 8 n.a. n.a. n.a. n.a. n.a. n.a. n.a. Choice 6 0 9 Choice 1 3 2 2 -1 2 3 Choice 3 0 10 n.a. -5 -5 -5 -5 -5 -5 Choice 6 0 11 Choice 2 2 -2 4 2 -5 n.a. Choice 2 Choice 2 12 n.a. 2 2 2 0 3 2 Choice 3 0 13 Choice 1 5 3 -5 -5 3 3 Choice 1 Choice 1 14 n.a. 1 1 -2 -2 3 3 n.a. 0 15 n.a. 1 -1 0 -1 2 3 Choice 3 0 16 Choice 3 3 2 2 1 3 3 n.a. Choice 3 17 Choice 3 2 2 2 2 4 5 n.a. 0 18 Choice 1 5 3 0 0 2 2 Choice 1 Choice 1 19 Choice 2 -3 -4 0 2 -5 -5 Choice 4 Choice 2 20 Choice 4 3 2 2 1 1 0 Choice 4 Choice 5 V7 and the other variables with generic variables work perfect. So does V4 which can be used with both a generic and a alternative specific regression coefficient. I only have problems with VS. There, R tells me that the length of the variable causes problems (?Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data), : variable lengths differ (found for 'P_id')). However, comparing lengths and dimensions of V4 and VS, I see no difference! Both original data are transformed into numeric values (as.numeric) (all values become to either zero or one, depending on the alternative) and then put into (previously defined) matrixes of the same dimensions. There were no differences visible when checking the transformed variables with STR or CLASS. What else can I check? As stated above, I tried to isolate the problem-laden variable VS to set up a single conditional logit model, but I did not manage it to make it to the long format (R comment was: Error in data[[choice]] : subscript out of bounds). No idea whether there is a connection between the two problems. Any comment would be much appreciated. Best regards and a nice weekend! -- View this message in context: http://r.789695.n4.nabble.com/length-of-variable-in-mlogit-tp4638323p4639069.html Sent from the R help mailing list archive at Nabble.com.
On Fri, Aug 3, 2012 at 11:49 AM, Lee van Cleef <l.van.cleef at gmx.net> wrote:
Hi Michael, many thanks for your comment. Below the original data as imported from Stata format.
Hi Lee, I apologize for being intransigent (well, no -- I actually don't) but could you provide your data using dput() as asked? Best, Michael
6 days later
Hi Michael, I have sent youi the data. Best, Lee -- View this message in context: http://r.789695.n4.nabble.com/length-of-variable-in-mlogit-tp4638323p4639873.html Sent from the R help mailing list archive at Nabble.com.
2 days later
Hi Lee,
I've finally had time to look at this:
If you look at
?mlogit.data
you'll see that choice must be "the variable indicating the choice
made: it can be either a logical vector, a numerical vector with 0
where the alternative is not chosen, a factor with level 'yes' when
the alternative is chosen."
For the data and script you provided me, we have a few problems:
firstly, masterdata.wide should be a data.frame, not a matrix -- this
can be rectified by wrapping it in as.data.frame. Once you do that,
take a look at the column "Y" which you supply for the choice
variable. It is a numerical vector, but it has no 0's so it doesn't
fit the requested input format.
For anyone else who wants to look at this:
dput(masterdata.wide)
structure(c(1, 2, 1, 4, 1, 5, 4, 1, 4, 3, 1, 2, 4, 0, 1, 0, 0,
0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(13L, 7L), .Dimnames = list(
NULL, c("Y", "X.2", "X.3", "X.4", "X.5", "X.6", "X.7")), na.action
= structure(c(1,
3, 7, 8, 12, 14, 17), class = "omit"))
Hope that helps,
Michael
On Fri, Aug 10, 2012 at 2:31 AM, Lee van Cleef <l.van.cleef at gmx.net> wrote:
Hi Michael, I have sent youi the data. Best, Lee -- View this message in context: http://r.789695.n4.nabble.com/length-of-variable-in-mlogit-tp4638323p4639873.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
1 day later
Hi Michael, many thanks for your time and the hint. This is much appreciated. Indeed, I managed it to get part of the dataset from which I want to develop my final logit model in the correct long format with true/false standing in the column for the choice variable,. But the problem with the variable I want to define with an alternative specific coefficient (=P_id) still persists. If doing the mlogit procedure and defining the table for this variable as 1) a data frame I get the comment [Start of R quote] Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data), : invalid type (list) for variable 'P_id' [End of R quote] Btw, length of the variable/data frame P_id is 6, the length of the choice variable is 1,406 and the length of a variable with a generic coefficient (and which is working fine) is 8,436. 2) as a matrix, I get the comment [Start of R quote] Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data), : variable lengths differ (found for 'P_id') [End of R quote] Btw, length of the variable/matrix P_id is 8,436, the length of the choice variable is 1,406 and the length of a variable with a generic coefficient (and which is working fine) is 8,436. Any comment or hint (also on literature, in case my mistakes are too simple and obvious) would be helpful. Best regards, Lee -- View this message in context: http://r.789695.n4.nabble.com/length-of-variable-in-mlogit-tp4638323p4640250.html Sent from the R help mailing list archive at Nabble.com.
Hi Michael, many thanks for the useful hints which gave me some deeper knowledge of R. It is definitely much appreciated. I think I have found the mistake - the problems did not arise from variable definitions etc. It was an intellectual mistake. The cause for the problems was that I had to divide a variable into single alternatives. The problematic variable referred to Party Identity. The whole matrix with the single parties as columns was of course not suitable for a variable with an alternative specific coefficient. Instead, each column/party was a single individual specific variable with an alternative specific coefficient. Now it works. Best wishes! -- View this message in context: http://r.789695.n4.nabble.com/length-of-variable-in-mlogit-tp4638323p4640266.html Sent from the R help mailing list archive at Nabble.com.