Dear R users I'd like to hear from someone if there is a function to do a repeated k-fold cross-validation for a lm object and get the predicted values for every observation. The situation is as follows: I had a data set composed by 174 observations from which I sampled randomly a subset composed by 150 observations. With the subset (n = 150) I fitted the model: y = a + bx. The model validation has to be done using a repeated k-fold cross-validation on the complete data set (n = 174). I need to use 10 folds and repeat the cross-validation 100 times. In the end of the procedure, I need to have access to the predicted values for each observation, that is, to the 100 predicted values for each observation. The function lmCV() in the package chemometrics provides the predicted values. However, it works only with multiple linear regression models. I hope there is a way of doing it. Best regards, ----- Bc.Sc.Agri. Alessandro Samuel-Rosa Postgraduate Program in Soil Science Federal University of Santa Maria Av. Roraima, n? 1000, Bairro Camobi, CEP 97105-970 Santa Maria, Rio Grande do Sul, Brazil -- View this message in context: http://r.789695.n4.nabble.com/Repeated-cross-validation-for-a-lm-object-tp4394833p4394833.html Sent from the R help mailing list archive at Nabble.com.
Repeated cross-validation for a lm object
5 messages · Greg Snow, mxkuhn, samuel-rosa
2 days later
The validate function in the rms package can do cross validation of ols objects (ols is similar to lm, but with additional information), the default is to do bootstrap validation, but you can specify crossvalidation instead. On Thu, Feb 16, 2012 at 10:44 AM, samuel-rosa
<alessandrosamuel at yahoo.com.br> wrote:
Dear R users I'd like to hear from someone if there is a function to do a repeated k-fold cross-validation for a lm object and get the predicted values for every observation. The situation is as follows: I had a data set composed by 174 observations from which I sampled randomly a subset composed by 150 observations. With the subset (n = 150) I fitted the model: y = a + bx. The model validation has to be done using a repeated k-fold cross-validation on the complete data set (n = 174). I need to use 10 folds and repeat the cross-validation 100 times. In the end of the procedure, I need to have access to the predicted values for each observation, that is, to the 100 predicted values for each observation. The function lmCV() in the package chemometrics provides the predicted values. However, it works only with multiple linear regression models. I hope there is a way of doing it. Best regards, ----- Bc.Sc.Agri. Alessandro Samuel-Rosa Postgraduate Program in Soil Science Federal University of Santa Maria Av. Roraima, n? 1000, Bairro Camobi, CEP 97105-970 Santa Maria, Rio Grande do Sul, Brazil -- View this message in context: http://r.789695.n4.nabble.com/Repeated-cross-validation-for-a-lm-object-tp4394833p4394833.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com
The train function in the caret package will do this. The trainControl function would use method ="repeatedcv" and repeats = 100.
On Feb 18, 2012, at 2:15 PM, Greg Snow <538280 at gmail.com> wrote:
The validate function in the rms package can do cross validation of ols objects (ols is similar to lm, but with additional information), the default is to do bootstrap validation, but you can specify crossvalidation instead. On Thu, Feb 16, 2012 at 10:44 AM, samuel-rosa <alessandrosamuel at yahoo.com.br> wrote:
Dear R users I'd like to hear from someone if there is a function to do a repeated k-fold cross-validation for a lm object and get the predicted values for every observation. The situation is as follows: I had a data set composed by 174 observations from which I sampled randomly a subset composed by 150 observations. With the subset (n = 150) I fitted the model: y = a + bx. The model validation has to be done using a repeated k-fold cross-validation on the complete data set (n = 174). I need to use 10 folds and repeat the cross-validation 100 times. In the end of the procedure, I need to have access to the predicted values for each observation, that is, to the 100 predicted values for each observation. The function lmCV() in the package chemometrics provides the predicted values. However, it works only with multiple linear regression models. I hope there is a way of doing it. Best regards, ----- Bc.Sc.Agri. Alessandro Samuel-Rosa Postgraduate Program in Soil Science Federal University of Santa Maria Av. Roraima, n? 1000, Bairro Camobi, CEP 97105-970 Santa Maria, Rio Grande do Sul, Brazil -- View this message in context: http://r.789695.n4.nabble.com/Repeated-cross-validation-for-a-lm-object-tp4394833p4394833.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
3 days later
Dear Max and Greg Thank you for your help. Unfortunately I was not able in getting what I need using the functions you suggested. I believe it can be a result of my inexperience with the packages caret and rms. Therefore, I provide more information about my problem and wish you can again provide me some help. I already have a single linear regression model fitted to my data (n = 150). The coefficients of the parameters have been determined using ordinary least squares. In the next step I want to use another data set (n = 174) to obtain the validation statistics. For other multivariate linear regression models I have been using the function lmCV() as follows:
set.seed(123) CV = lmCV(a~b+c, my.data, segments=10, repl=100, segment.type="random")
where the k segments are selected randomly (I need to use a known seed). CV$predicted gives me the predicted values of "a" as a function of "b" and "c" for all the n = 174 observations in each of the 100 replications. However, it does not work for single linear regression models. I may have made a mistake, but I could not found any function such as $predicted to get all the 100 predicted values for each of the 174 observations. Hope you can help me once again. Best regards, ----- Bc.Sc.Agri. Alessandro Samuel-Rosa Postgraduate Program in Soil Science Federal University of Santa Maria Av. Roraima, n? 1000, Bairro Camobi, CEP 97105-970 Santa Maria, Rio Grande do Sul, Brazil -- View this message in context: http://r.789695.n4.nabble.com/Repeated-cross-validation-for-a-lm-object-tp4394833p4411252.html Sent from the R help mailing list archive at Nabble.com.
Dear Max Thank you for your attention. The train function in the caret package realy does what I need. Best regards, ----- Bc.Sc.Agri. Alessandro Samuel-Rosa Postgraduate Program in Soil Science Federal University of Santa Maria Av. Roraima, n? 1000, Bairro Camobi, CEP 97105-970 Santa Maria, Rio Grande do Sul, Brazil -- View this message in context: http://r.789695.n4.nabble.com/Repeated-cross-validation-for-a-lm-object-tp4394833p4411744.html Sent from the R help mailing list archive at Nabble.com.