Skip to content
Prev 301601 / 398503 Next

Problem with a regression - Dataset Workinghours

On Jul 28, 2012, at 17:37 , Giorgio Monti wrote:

            
Yes: don't do that. You are not going to "build a predictive model that express the probability that a wife works more than 8 hours per day" from data where everyone works more than 8 hours by day!

You can either fit the model to all data and work out the probabilistic consequences, or if you don't quite believe the normality assumption of linear models, perhaps reduce the outcome to 0/1 and turn to logit or probit regression.

It is not technically hard to fit data to a subset, but it is a big no-no to subset on the dependent variable. Well, you can, and people do, actually do subsampling on the response variable, but the standard methods of analysis do not apply.