For anyone who is looking for an answer to this in the future... I went for "imputation". It's a way of filling in missing variables based off of what you see elsewhere in the data. Myself, I simply took a sample of the categorical from the rest of the test set. Some may argue that this is erroneous, as I simply don't know anything about the new categorical in the test set, and I should throw it away. However, my results are going to be aggregated later, and this lets me do some central limit theorem hand waving. -- View this message in context: http://r.789695.n4.nabble.com/GLM-What-is-a-good-way-for-dealing-with-new-factor-levels-in-the-test-set-tp4706621p4706772.html Sent from the R help mailing list archive at Nabble.com.
GLM: What is a good way for dealing with new factor levels in the test set?
1 message · thuksu