An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140201/5fb58c7d/attachment.pl>
predicted values
4 messages · Felipe Carrillo, Joshua Wiley, Bert Gunter
Dear Felipe, That is a normal behavior --- The prediction for that simple model decreases over time, and ends up negative. If the outcome cannot take on negative values, treating it as a continuous gaussian may not be optimal --- perhaps some transformation, like using a log link so that the expoentiated values are always positive would be better? Alternately, if the predictions are going negative, not because the data is over all, but say there is a quick decrease in values in the first part of time but later on it slows, but if you have an overly simplisitic time model, it may just keep decreasing. Using a smoother with a higher basis dimensions may help more accurately model the function over the span of time in your dataset and then not have predicted values. I do not think that there would be any straight forward 'force' the model to be positive only. Best, Joshua On Sat, Feb 1, 2014 at 5:05 PM, Felipe Carrillo
<mazatlanmexico at yahoo.com> wrote:
Consider this dummy dataset.
My real dataset with over 1000 records has
scatter large and small values.
I want to predict for values with NA but I
get negative predictions. Is this a normal
behaviour or I am missing a gam argument
to force the model to predict positive values.
library(mgcv)
test <- data.frame(iddate=seq(as.Date("2014-01-01"),
as.Date("2014-01-12"), by="days"),
value=c(300,29,22,NA,128,24,15,1,3,30,NA,2))
test
str(test)
mod <- gam(value ~ s(as.numeric(iddate)),data=test)
# Predict for values with NA's
test$pred <- with(test,ifelse(is.na(value),predict(mod,test),value))
test
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com
1 day later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140203/4510c20d/attachment.pl>
... but do note that doing what you describe (using predicted values for missings) can mess up inference: it obviously results in underestimating error variability. If you're not doing inference, then probably no harm, no foul. If you are, then here's to irreproducibility! If you want to handle missings and still get meaningful inference (an oxymoron?), then find someone expert in such matters to consult. R has several packages devoted to this (but I'm not the person to advise about them). Also note that often scientists treat censoring as missing. That's another booboo. And my humble apology if this is not you. Finally note that graphics often handles missings sensibly, gracefully ignoring them. So if graphs are what you seek, maybe you don't need to worry about it. And, it should go without saying that given my complete ignorance of what you're up to, all the above should be taken with the appropriate dose of salt. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." H. Gilbert Welch On Mon, Feb 3, 2014 at 2:23 PM, Felipe Carrillo
<mazatlanmexico at yahoo.com> wrote:
Hi Joshua, Thanks for the suggestion, I will check into log link. I just basically want to fill in missing values for days where data is not available. Negative values definetely won't work for the kind of data that I am collecting. On Saturday, February 1, 2014 7:51 PM, Joshua Wiley <jwiley.psych at gmail.com> wrote: Dear Felipe,
That is a normal behavior --- The prediction for that simple model decreases over time, and ends up negative. If the outcome cannot take on negative values, treating it as a continuous gaussian may not be optimal --- perhaps some transformation, like using a log link so that the expoentiated values are always positive would be better? Alternately, if the predictions are going negative, not because the data is over all, but say there is a quick decrease in values in the first part of time but later on it slows, but if you have an overly simplisitic time model, it may just keep
decreasing. Using a smoother
with a higher basis dimensions may help more accurately model the function over the span of time in your dataset and then not have predicted values. I do not think that there would be any straight forward 'force' the model to be positive only. Best, Joshua On Sat, Feb 1, 2014 at 5:05 PM, Felipe Carrillo <mazatlanmexico at yahoo.com> wrote:
Consider this dummy dataset. My real dataset with over 1000 records has scatter large and
small values.
I want to predict for
values with NA but I
get negative predictions. Is this a normal
behaviour or I am missing a gam argument
to force the model to predict positive values.
library(mgcv)
test <- data.frame(iddate=seq(as.Date("2014-01-01"),
as.Date("2014-01-12"), by="days"),
value=c(300,29,22,NA,128,24,15,1,3,30,NA,2))
test
str(test)
mod <- gam(value ~ s(as.numeric(iddate)),data=test)
# Predict for values with NA's
test$pred <- with(test,ifelse(is.na(value),predict(mod,test),value))
test
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com/ [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.