error: in catg (xi, name=nam, label=lab): "LO2" has <2 category levels
Dear all, As David mentioned, I used his R-code to try to see the dimension behind the 'LO2' variable. These are the results:
lapply (subsets, function (x) {table(x$LO2)})
[[1]]
nee geen atrofie ja atrofie aanweizg
173 0
[[2]]
nee geen atrofie ja atrofie aanweizg
169 3
[[3]]
nee geen atrofie ja atrofie aanweizg
174 0
[[4]]
nee geen atrofie ja atrofie aanweizg
172 2
[[5]]
nee geen atrofie ja atrofie aanweizg
173 2
[[6]]
nee geen atrofie ja atrofie aanweizg
171 3
[[7]]
nee geen atrofie ja atrofie aanweizg
167 5
[[8]]
nee geen atrofie ja atrofie aanweizg
174 1
[[9]]
nee geen atrofie ja atrofie aanweizg
173 1
[[10]]
nee geen atrofie ja atrofie aanweizg
175 0
I guess that the lrm model doesn't work - as I tried to model each subset separately, and it didn't work in subsets 1, 3 and 10 - because there are no persons in one of the two categories. Therefore this LO2 variable seems unable to be a predictor - let alone a strong predictor. Regardless of this, it seems strange that with a lot of simulations in which there is always a change that a specific variable by chance alone will consist of objects with only one category gives problems with estimating the prediction models. Does anyone have a suggestion how to deal with that?
Kind regards and thanks for all the help so far,
Tobias
________________________________________
Van: David Winsemius [dwinsemius at comcast.net]
Verzonden: vrijdag 7 september 2012 18:17
Aan: Berg, Tobias van den
CC: PIKAL Petr; r-help
Onderwerp: Re: [R] error: in catg (xi, name=nam, label=lab): "LO2" has <2 category levels
On Sep 7, 2012, at 8:03 AM, Berg, Tobias van den wrote:
Dear all,
Probably I made a beginners mistake. While importing a spss file I didn't specify that missings should be NA (use.missings = TRUE). Thanks to Petr Pikal and Bert Gunter I now know how to check how many values are known within a variable.
Although I can fit my logistic model on this dataset, unfortunately, I experience the same problem after bootstrapping the original dataset at hand.
The R-code so far:
bootstraps<-10
subsets<-list()
for (i in 1:bootstraps){
subsets[[i]]<-as.matrix(sample(1:length(dat$PatID), replace=TRUE))
}
subsets<-lapply (subsets, function (x) {subsets <- dat[x,]})
fit.subsets <-lapply (subsets, function (x) {lrm(MRI_Diag_RC ~ factor(O4_1r) + N6_1r + leeftijd + LO1 + LO2, model=T, x=T, y=T, data=x)})
Everything is fine till I run the last line. The following result shows in R: Error in catg(xi, name = nam, label = lab) : LO2 has <2 category levels
I checked the simulated datasets how many values within LO2 are known, using:
lapply (subsets, function (x) {str(x$LO2)})
Instead do :
apply (subsets, function (x) {table(x$LO2)})
You cannot tell what distribution of values you are getting with str(). Just because a factor has 2 levels does NOT mean it has two unique values populating those levels.
--
David.
The result: Factor w/ 2 levels "nee geen atrofie",..: 1 1 1 1 1 NA 1 1 1 1 ... Factor w/ 2 levels "nee geen atrofie",..: 1 1 1 1 1 1 1 1 1 1 ... Factor w/ 2 levels "nee geen atrofie",..: 1 1 1 1 1 NA 1 1 1 1 ... Factor w/ 2 levels "nee geen atrofie",..: 1 1 1 1 1 1 1 1 1 1 ... Factor w/ 2 levels "nee geen atrofie",..: 1 1 1 1 1 1 1 1 1 1 ... Factor w/ 2 levels "nee geen atrofie",..: 1 1 1 1 1 1 1 1 1 1 ... Factor w/ 2 levels "nee geen atrofie",..: 1 1 1 1 1 1 1 1 1 1 ... Factor w/ 2 levels "nee geen atrofie",..: 1 1 1 1 1 1 1 1 1 1 ... Factor w/ 2 levels "nee geen atrofie",..: 1 1 1 1 1 1 1 1 1 1 ... Factor w/ 2 levels "nee geen atrofie",..: 1 1 1 1 1 1 1 1 1 1 ... [[1]] NULL [[2]] NULL [[3]] NULL [[4]] NULL [[5]] NULL [[6]] NULL [[7]] NULL [[8]] NULL [[9]] NULL [[10]] NULL It would be great to receive ideas, comments or questions about my challenge. Kind regards, Tobias -----Oorspronkelijk bericht----- Van: PIKAL Petr [mailto:petr.pikal at precheza.cz] Verzonden: vrijdag 7 september 2012 16:22 Aan: Berg, Tobias van den CC: r-help Onderwerp: RE: [R] error: in catg (xi, name=nam, label=lab): "LO2" has <2 category levels Hi It is good to cc to list. Somebody could have better insight.
Dear Petr, Thank you for responding. It seems right what you say. The funny thing however is that the 'LO2' variable in SPSS has 2 answer categories. If I look at the same variable in R, again I see 2 different values.
How do you know? Any command? You shall provide at least str(LO2) result as we do not have access to your PC.
I used your "sapply" code and guess that I retrieved (per variable) the amount of answer categories/possible values. LO2 scores a 3 in the accompanying results. Do you know how I can change that?
Hm. Result of this depends on what is LO2. If it is numeric, you have 3 unique values. If it is factor you can have either 3 levels or 2 levels and NA values(again str result would be helpful and we need not just guess how your data look like). Well let me guess levels(dat$LO2) says you have 3 levels 2 meaningful and one comes out probably as empty string "". It shall be the first level so levels(dat$LO2)[1] <- NA shall drop this unused and created levels. Or maybe you can get rid of this unwanted levels by setting na.string to empty string during import, however my knowledge of SPSS limitedly approaching zero so I could be completely wrong. If your values are factors, you can change the code to sapply(sapply(ff, levels), length) and you will get 0 for numeric variables and number of levels for factor variables. More complete insight in your data can be also found by summary(dat) Regards Petr
Kind regards, Tobias -----Oorspronkelijk bericht----- Van: PIKAL Petr [mailto:petr.pikal at precheza.cz] Verzonden: vrijdag 7 september 2012 15:02 Aan: Berg, Tobias van den; r-help at r-project.org Onderwerp: RE: [R] error: in catg (xi, name=nam, label=lab): "LO2" has <2 category levels Hi
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of Tvandenberg Sent: Friday, September 07, 2012 1:05 PM To: r-help at r-project.org Subject: [R] error: in catg (xi, name=nam, label=lab): "LO2" has <2 category levels Dear R-users, During a fit procedure in a Logistic prediction model I encounter
the
following problem: error: in catg (xi, name=nam, label=lab: X has <2 category levels
I do not know lrm but the error seems to be explaining itself, some variable has only one level and shall have 2 sapply(sapply(dat, unique), length) shall give you for used variables value 2 or more. Regards Petr
The following code is used: fit <-lrm(MRI_Diag_RC ~ factor(O4_1r) + N6_1r + leeftijd + LO1 + LO2
+
LO3+ LO4+ LO5+ LO6+ LO7+ LO8+ LO9+ LO10+ LO11+ LO12+ LO13 + LO14+ LO15+ LO16+ LO17+ LO18+ LO19+ LO20+ LO21+ LO22+ LO23+ LO24 + LO26+ LO27 + LO29, LO17+ LO18+ LO19+ LO20+ LO21+ LO22+ LO23+ model=T, x=T, y=T, data=dat) Most predictors are (dichotomous) nominal variables as is the problematic "LO2". Does anyone know what the problem is and how I can correct it? Kind regards, Tobias -- View this message in context: http://r.789695.n4.nabble.com/error-in- catg-xi-name-nam-label-lab-LO2-has-2-category-levels-tp4642495.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD Alameda, CA, USA