Dear all, i have created a logistic regression model, on the train df: mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family = "binomial") then i try to predict with the test df Predict<- predict(mymodel1, newdata = test, type = "response") then iget this error message: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) Factor "TG_KraftF5" has new levels i have tried different proposals from stackoverflow, but unfortunately they did not solved the problem. Do you have any idea how to test a logistic regression model when you have different levels in train and in test df? thank you in advance Regards, Gabor
test logistic regression model
9 messages · Gábor Malomsoki, Rui Barradas, Bert Gunter +2 more
You can't predict results for categories that you've not seen before (think about it). You will need to remove those cases from your test set (or convert them to NA and predict them as NA). -- Bert On Sun, Nov 20, 2022 at 7:02 AM G?bor Malomsoki <gmalomsoki1980 at gmail.com> wrote:
Dear all,
i have created a logistic regression model,
on the train df:
mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family = "binomial")
then i try to predict with the test df
Predict<- predict(mymodel1, newdata = test, type = "response")
then iget this error message:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
object$xlevels)
Factor "TG_KraftF5" has new levels
i have tried different proposals from stackoverflow, but unfortunately they
did not solved the problem.
Do you have any idea how to test a logistic regression model when you have
different levels in train and in test df?
thank you in advance
Regards,
Gabor
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dear Bert, Yes, was trying to fill the not existing categories with NAs, but the suggested solutions in stackoverflow.com unfortunately did not work. Best regards Gabor Bert Gunter <bgunter.4567 at gmail.com> schrieb am So., 20. Nov. 2022, 16:20:
You can't predict results for categories that you've not seen before (think about it). You will need to remove those cases from your test set (or convert them to NA and predict them as NA). -- Bert On Sun, Nov 20, 2022 at 7:02 AM G?bor Malomsoki <gmalomsoki1980 at gmail.com> wrote:
Dear all,
i have created a logistic regression model,
on the train df:
mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family =
"binomial")
then i try to predict with the test df
Predict<- predict(mymodel1, newdata = test, type = "response")
then iget this error message:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
object$xlevels)
Factor "TG_KraftF5" has new levels
i have tried different proposals from stackoverflow, but unfortunately
they
did not solved the problem.
Do you have any idea how to test a logistic regression model when you have
different levels in train and in test df?
thank you in advance
Regards,
Gabor
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
?s 15:29 de 20/11/2022, G?bor Malomsoki escreveu:
Dear Bert, Yes, was trying to fill the not existing categories with NAs, but the suggested solutions in stackoverflow.com unfortunately did not work. Best regards Gabor Bert Gunter <bgunter.4567 at gmail.com> schrieb am So., 20. Nov. 2022, 16:20:
You can't predict results for categories that you've not seen before (think about it). You will need to remove those cases from your test set (or convert them to NA and predict them as NA). -- Bert On Sun, Nov 20, 2022 at 7:02 AM G?bor Malomsoki <gmalomsoki1980 at gmail.com> wrote:
Dear all,
i have created a logistic regression model,
on the train df:
mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family =
"binomial")
then i try to predict with the test df
Predict<- predict(mymodel1, newdata = test, type = "response")
then iget this error message:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
object$xlevels)
Factor "TG_KraftF5" has new levels
i have tried different proposals from stackoverflow, but unfortunately
they
did not solved the problem.
Do you have any idea how to test a logistic regression model when you have
different levels in train and in test df?
thank you in advance
Regards,
Gabor
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
hello, What exactly didn't work? You say you have tried the solutions found in stackoverflow but without a link, we don't know which answers to which questions you are talking about. Like Bert said, if you assign NA to the new levels, present only in test, it should work. Can you post links to what you have tried? Hope this helps, Rui Barradas
small reprex:
set.seed(5)
dat <- data.frame(f = rep(c('r','g'),4), y = runif(8))
newdat <- data.frame(f =rep(c('r','g','b'),2))
## convert values in newdat not seen in dat to NA
is.na(newdat$f) <-!( newdat$f %in% dat$f)
lmfit <- lm(y~f, data = dat)
##Result:
predict(lmfit,newdat)
1 2 3 4 5 6 0.4374251 0.6196527 NA 0.4374251 0.6196527 NA If this does not suffice, as Rui said, we need details of what you did. (predict.glm works like predict.lm) -- Bert
On Sun, Nov 20, 2022 at 7:46 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:
?s 15:29 de 20/11/2022, G?bor Malomsoki escreveu:
Dear Bert, Yes, was trying to fill the not existing categories with NAs, but the suggested solutions in stackoverflow.com unfortunately did not work. Best regards Gabor Bert Gunter <bgunter.4567 at gmail.com> schrieb am So., 20. Nov. 2022, 16:20:
You can't predict results for categories that you've not seen before (think about it). You will need to remove those cases from your test set (or convert them to NA and predict them as NA). -- Bert On Sun, Nov 20, 2022 at 7:02 AM G?bor Malomsoki <gmalomsoki1980 at gmail.com> wrote:
Dear all,
i have created a logistic regression model,
on the train df:
mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family =
"binomial")
then i try to predict with the test df
Predict<- predict(mymodel1, newdata = test, type = "response")
then iget this error message:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
object$xlevels)
Factor "TG_KraftF5" has new levels
i have tried different proposals from stackoverflow, but unfortunately
they
did not solved the problem.
Do you have any idea how to test a logistic regression model when you have
different levels in train and in test df?
thank you in advance
Regards,
Gabor
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
hello, What exactly didn't work? You say you have tried the solutions found in stackoverflow but without a link, we don't know which answers to which questions you are talking about. Like Bert said, if you assign NA to the new levels, present only in test, it should work. Can you post links to what you have tried? Hope this helps, Rui Barradas
Two possible fixes occur to me 1) Redo the test/training split but within levels of factor - so you have the same split within each level and each level accounted for in training and testing 2) if you have a lot of levels, and perhaps sparse representation in a few, consider recoding levels to pool the rare ones into an ?other? category
On Sun, Nov 20, 2022 at 11:41 AM Bert Gunter <bgunter.4567 at gmail.com> wrote:
small reprex:
set.seed(5)
dat <- data.frame(f = rep(c('r','g'),4), y = runif(8))
newdat <- data.frame(f =rep(c('r','g','b'),2))
## convert values in newdat not seen in dat to NA
is.na(newdat$f) <-!( newdat$f %in% dat$f)
lmfit <- lm(y~f, data = dat)
##Result:
predict(lmfit,newdat)
1 2 3 4 5 6 0.4374251 0.6196527 NA 0.4374251 0.6196527 NA If this does not suffice, as Rui said, we need details of what you did. (predict.glm works like predict.lm) -- Bert On Sun, Nov 20, 2022 at 7:46 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:
?s 15:29 de 20/11/2022, G?bor Malomsoki escreveu:
Dear Bert, Yes, was trying to fill the not existing categories with NAs, but the suggested solutions in stackoverflow.com unfortunately did not work. Best regards Gabor Bert Gunter <bgunter.4567 at gmail.com> schrieb am So., 20. Nov. 2022,
16:20:
You can't predict results for categories that you've not seen before (think about it). You will need to remove those cases from your test
set
(or convert them to NA and predict them as NA). -- Bert On Sun, Nov 20, 2022 at 7:02 AM G?bor Malomsoki <
gmalomsoki1980 at gmail.com>
wrote:
Dear all, i have created a logistic regression model, on the train df: mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family = "binomial") then i try to predict with the test df Predict<- predict(mymodel1, newdata = test, type = "response") then iget this error message: Error in model.frame.default(Terms, newdata, na.action = na.action,
xlev =
object$xlevels) Factor "TG_KraftF5" has new levels i have tried different proposals from stackoverflow, but
unfortunately
they did not solved the problem. Do you have any idea how to test a logistic regression model when
you have
different levels in train and in test df?
thank you in advance
Regards,
Gabor
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. hello, What exactly didn't work? You say you have tried the solutions found in stackoverflow but without a link, we don't know which answers to which questions you are talking about. Like Bert said, if you assign NA to the new levels, present only in test, it should work. Can you post links to what you have tried? Hope this helps, Rui Barradas ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sent from Gmail Mobile [[alternative HTML version deleted]]
I think (2) might be a bad idea if one of the "sparse"categories has high predictive power. You'll lose it when you pool, will you not? Also, there is the problem of subjectively defining "sparse." However, 1) seems quite sensible to me. But IANAE. -- Bert
On Sun, Nov 20, 2022 at 9:49 AM Mitchell Maltenfort <mmalten at gmail.com> wrote:
Two possible fixes occur to me 1) Redo the test/training split but within levels of factor - so you have the same split within each level and each level accounted for in training and testing 2) if you have a lot of levels, and perhaps sparse representation in a few, consider recoding levels to pool the rare ones into an ?other? category On Sun, Nov 20, 2022 at 11:41 AM Bert Gunter <bgunter.4567 at gmail.com> wrote:
small reprex:
set.seed(5)
dat <- data.frame(f = rep(c('r','g'),4), y = runif(8))
newdat <- data.frame(f =rep(c('r','g','b'),2))
## convert values in newdat not seen in dat to NA
is.na(newdat$f) <-!( newdat$f %in% dat$f)
lmfit <- lm(y~f, data = dat)
##Result:
predict(lmfit,newdat)
1 2 3 4 5 6 0.4374251 0.6196527 NA 0.4374251 0.6196527 NA If this does not suffice, as Rui said, we need details of what you did. (predict.glm works like predict.lm) -- Bert On Sun, Nov 20, 2022 at 7:46 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:
?s 15:29 de 20/11/2022, G?bor Malomsoki escreveu:
Dear Bert, Yes, was trying to fill the not existing categories with NAs, but the suggested solutions in stackoverflow.com unfortunately did not work. Best regards Gabor Bert Gunter <bgunter.4567 at gmail.com> schrieb am So., 20. Nov. 2022, 16:20:
You can't predict results for categories that you've not seen before (think about it). You will need to remove those cases from your test set (or convert them to NA and predict them as NA). -- Bert On Sun, Nov 20, 2022 at 7:02 AM G?bor Malomsoki <gmalomsoki1980 at gmail.com> wrote:
Dear all,
i have created a logistic regression model,
on the train df:
mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family =
"binomial")
then i try to predict with the test df
Predict<- predict(mymodel1, newdata = test, type = "response")
then iget this error message:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
object$xlevels)
Factor "TG_KraftF5" has new levels
i have tried different proposals from stackoverflow, but unfortunately
they
did not solved the problem.
Do you have any idea how to test a logistic regression model when you have
different levels in train and in test df?
thank you in advance
Regards,
Gabor
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
hello, What exactly didn't work? You say you have tried the solutions found in stackoverflow but without a link, we don't know which answers to which questions you are talking about. Like Bert said, if you assign NA to the new levels, present only in test, it should work. Can you post links to what you have tried? Hope this helps, Rui Barradas
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Sent from Gmail Mobile
I like option 1. Option 2 may cause problems if you are pooling groups that do not go together. This is especially a problem if you know that the data is missing some groups. I would consider dropping rare groups - or compare results between pooling and dropping options. If the answer is the same in both cases then use the approach that makes your life easier with reviewers/clients. If the answer is different then I would go with dropping rare categories, or present both and highlight the difference in outcome. A third option is to gather more data. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Bert Gunter Sent: Sunday, November 20, 2022 1:06 PM To: Mitchell Maltenfort <mmalten at gmail.com> Cc: R-help <R-help at r-project.org> Subject: Re: [R] test logistic regression model [External Email] I think (2) might be a bad idea if one of the "sparse"categories has high predictive power. You'll lose it when you pool, will you not? Also, there is the problem of subjectively defining "sparse." However, 1) seems quite sensible to me. But IANAE. -- Bert
On Sun, Nov 20, 2022 at 9:49 AM Mitchell Maltenfort <mmalten at gmail.com> wrote:
Two possible fixes occur to me 1) Redo the test/training split but within levels of factor - so you have the same split within each level and each level accounted for in training and testing 2) if you have a lot of levels, and perhaps sparse representation in a few, consider recoding levels to pool the rare ones into an "other" category On Sun, Nov 20, 2022 at 11:41 AM Bert Gunter <bgunter.4567 at gmail.com> wrote:
small reprex:
set.seed(5)
dat <- data.frame(f = rep(c('r','g'),4), y = runif(8)) newdat <-
data.frame(f =rep(c('r','g','b'),2)) ## convert values in newdat not
seen in dat to NA
is.na(newdat$f) <-!( newdat$f %in% dat$f) lmfit <- lm(y~f, data =
dat)
##Result:
predict(lmfit,newdat)
1 2 3 4 5 6 0.4374251 0.6196527 NA 0.4374251 0.6196527 NA If this does not suffice, as Rui said, we need details of what you did. (predict.glm works like predict.lm) -- Bert On Sun, Nov 20, 2022 at 7:46 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:
?s 15:29 de 20/11/2022, G?bor Malomsoki escreveu:
Dear Bert, Yes, was trying to fill the not existing categories with NAs, but the suggested solutions in stackoverflow.com unfortunately did not work. Best regards Gabor Bert Gunter <bgunter.4567 at gmail.com> schrieb am So., 20. Nov. 2022, 16:20:
You can't predict results for categories that you've not seen before (think about it). You will need to remove those cases from your test set (or convert them to NA and predict them as NA). -- Bert On Sun, Nov 20, 2022 at 7:02 AM G?bor Malomsoki <gmalomsoki1980 at gmail.com> wrote:
Dear all,
i have created a logistic regression model,
on the train df:
mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family =
"binomial")
then i try to predict with the test df
Predict<- predict(mymodel1, newdata = test, type = "response")
then iget this error message:
Error in model.frame.default(Terms, newdata, na.action =
na.action, xlev =
object$xlevels)
Factor "TG_KraftF5" has new levels
i have tried different proposals from stackoverflow, but
unfortunately they did not solved the problem.
Do you have any idea how to test a logistic regression model
when you have different levels in train and in test df?
thank you in advance
Regards,
Gabor
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F %2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01% 7Ctebert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f 84a314d76ace60a62331e1b84%7C0%7C0%7C638045643951801851%7CUnknow n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1 haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ceyiq3LmFfHRlfnrw 87wzELUGTHLSv7qvuv1tyqGruU%3D&reserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F% 2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Cteb ert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a31 4d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CT WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=swql970slrq8f9bAwP%2FE s7PbWm5EQvFHWNga2JwHWeY%3D&reserved=0 and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2 Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Cte bert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314 d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFp bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXV CI6Mn0%3D%7C3000%7C%7C%7C&sdata=N2g%2Fx2IMW4OL0HSmq6pP2pxymP0 FUAQbciQXRPOe7KM%3D&reserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2F www.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert% 40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ac e60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn 0%3D%7C3000%7C%7C%7C&sdata=swql970slrq8f9bAwP%2FEs7PbWm5EQvFH WNga2JwHWeY%3D&reserved=0 and provide commented, minimal, self-contained, reproducible code.
hello, What exactly didn't work? You say you have tried the solutions found in stackoverflow but without a link, we don't know which answers to which questions you are talking about. Like Bert said, if you assign NA to the new levels, present only in test, it should work. Can you post links to what you have tried? Hope this helps, Rui Barradas
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsta t.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40u fl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a623 31e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C% 7C%7C&sdata=N2g%2Fx2IMW4OL0HSmq6pP2pxymP0FUAQbciQXRPOe7KM%3D& reserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww. r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.ed u%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a62331e1b 84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C &sdata=swql970slrq8f9bAwP%2FEs7PbWm5EQvFHWNga2JwHWeY%3D&reser ved=0 and provide commented, minimal, self-contained, reproducible code.
-- Sent from Gmail Mobile
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=N2g%2Fx2IMW4OL0HSmq6pP2pxymP0FUAQbciQXRPOe7KM%3D&reserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=swql970slrq8f9bAwP%2FEs7PbWm5EQvFHWNga2JwHWeY%3D&reserved=0 and provide commented, minimal, self-contained, reproducible code.
Agreed on the ranking of (1) vs (2)
On Sun, Nov 20, 2022 at 1:30 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
I like option 1. Option 2 may cause problems if you are pooling groups that do not go together. This is especially a problem if you know that the data is missing some groups. I would consider dropping rare groups - or compare results between pooling and dropping options. If the answer is the same in both cases then use the approach that makes your life easier with reviewers/clients. If the answer is different then I would go with dropping rare categories, or present both and highlight the difference in outcome. A third option is to gather more data. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Bert Gunter Sent: Sunday, November 20, 2022 1:06 PM To: Mitchell Maltenfort <mmalten at gmail.com> Cc: R-help <R-help at r-project.org> Subject: Re: [R] test logistic regression model [External Email] I think (2) might be a bad idea if one of the "sparse"categories has high predictive power. You'll lose it when you pool, will you not? Also, there is the problem of subjectively defining "sparse." However, 1) seems quite sensible to me. But IANAE. -- Bert On Sun, Nov 20, 2022 at 9:49 AM Mitchell Maltenfort <mmalten at gmail.com> wrote:
Two possible fixes occur to me 1) Redo the test/training split but within levels of factor - so you have the same split within each level and each level accounted for in training and testing 2) if you have a lot of levels, and perhaps sparse representation in a few, consider recoding levels to pool the rare ones into an "other" category On Sun, Nov 20, 2022 at 11:41 AM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
small reprex:
set.seed(5)
dat <- data.frame(f = rep(c('r','g'),4), y = runif(8)) newdat <-
data.frame(f =rep(c('r','g','b'),2)) ## convert values in newdat not
seen in dat to NA
is.na(newdat$f) <-!( newdat$f %in% dat$f) lmfit <- lm(y~f, data =
dat)
##Result:
predict(lmfit,newdat)
1 2 3 4 5 6 0.4374251 0.6196527 NA 0.4374251 0.6196527 NA If this does not suffice, as Rui said, we need details of what you did. (predict.glm works like predict.lm) -- Bert On Sun, Nov 20, 2022 at 7:46 AM Rui Barradas <ruipbarradas at sapo.pt>
wrote:
?s 15:29 de 20/11/2022, G?bor Malomsoki escreveu:
Dear Bert, Yes, was trying to fill the not existing categories with NAs, but the suggested solutions in stackoverflow.com unfortunately did not
work.
Best regards Gabor Bert Gunter <bgunter.4567 at gmail.com> schrieb am So., 20. Nov.
2022, 16:20:
You can't predict results for categories that you've not seen before (think about it). You will need to remove those cases from your test set (or convert them to NA and predict them as NA). -- Bert On Sun, Nov 20, 2022 at 7:02 AM G?bor Malomsoki <gmalomsoki1980 at gmail.com> wrote:
Dear all,
i have created a logistic regression model,
on the train df:
mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family =
"binomial")
then i try to predict with the test df
Predict<- predict(mymodel1, newdata = test, type = "response")
then iget this error message:
Error in model.frame.default(Terms, newdata, na.action =
na.action, xlev =
object$xlevels)
Factor "TG_KraftF5" has new levels
i have tried different proposals from stackoverflow, but
unfortunately they did not solved the problem.
Do you have any idea how to test a logistic regression model
when you have different levels in train and in test df?
thank you in advance
Regards,
Gabor
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F %2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01% 7Ctebert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f 84a314d76ace60a62331e1b84%7C0%7C0%7C638045643951801851%7CUnknow n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1 haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ceyiq3LmFfHRlfnrw 87wzELUGTHLSv7qvuv1tyqGruU%3D&reserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F% 2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Cteb ert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a31 4d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CT WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=swql970slrq8f9bAwP%2FE s7PbWm5EQvFHWNga2JwHWeY%3D&reserved=0 and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2 Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Cte bert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314 d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFp bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXV CI6Mn0%3D%7C3000%7C%7C%7C&sdata=N2g%2Fx2IMW4OL0HSmq6pP2pxymP0 FUAQbciQXRPOe7KM%3D&reserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2F www.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert% 40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ac e60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn 0%3D%7C3000%7C%7C%7C&sdata=swql970slrq8f9bAwP%2FEs7PbWm5EQvFH WNga2JwHWeY%3D&reserved=0 and provide commented, minimal, self-contained, reproducible code.
hello, What exactly didn't work? You say you have tried the solutions found in stackoverflow but without a link, we don't know which answers to which questions you are talking about. Like Bert said, if you assign NA to the new levels, present only in test, it should work. Can you post links to what you have tried? Hope this helps, Rui Barradas
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsta t.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40u fl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a623 31e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C% 7C%7C&sdata=N2g%2Fx2IMW4OL0HSmq6pP2pxymP0FUAQbciQXRPOe7KM%3D& reserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww. r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.ed u%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a62331e1b 84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C &sdata=swql970slrq8f9bAwP%2FEs7PbWm5EQvFHWNga2JwHWeY%3D&reser ved=0 and provide commented, minimal, self-contained, reproducible code.
-- Sent from Gmail Mobile
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=N2g%2Fx2IMW4OL0HSmq6pP2pxymP0FUAQbciQXRPOe7KM%3D&reserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=swql970slrq8f9bAwP%2FEs7PbWm5EQvFHWNga2JwHWeY%3D&reserved=0 and provide commented, minimal, self-contained, reproducible code.
Sent from Gmail Mobile [[alternative HTML version deleted]]