Skip to content

Repeated cross-validation for a lm object

5 messages · Greg Snow, mxkuhn, samuel-rosa

#
Dear R users

I'd like to hear from someone if there is a function to do a repeated k-fold
cross-validation for a lm object and get the predicted values for every
observation. The situation is as follows:
I had a data set composed by 174 observations from which I sampled randomly
a subset composed by 150 observations. With the subset (n = 150) I fitted
the model: y = a + bx. The model validation has to be done using a repeated
k-fold cross-validation on the complete data set (n = 174). I need to use 10
folds and repeat the cross-validation 100 times. In the end of the
procedure, I need to have access to the predicted values for each
observation, that is, to the 100 predicted values for each observation. The
function lmCV() in the package chemometrics provides the predicted values.
However, it works only with multiple linear regression models.
I hope there is a way of doing it.
Best regards,

-----
Bc.Sc.Agri. Alessandro Samuel-Rosa
Postgraduate Program in Soil Science
Federal University of Santa Maria
Av. Roraima, n? 1000, Bairro Camobi, CEP 97105-970
Santa Maria, Rio Grande do Sul, Brazil
--
View this message in context: http://r.789695.n4.nabble.com/Repeated-cross-validation-for-a-lm-object-tp4394833p4394833.html
Sent from the R help mailing list archive at Nabble.com.
2 days later
#
The validate function in the rms package can do cross validation of
ols objects (ols is similar to lm, but with additional information),
the default is to do bootstrap validation, but you can specify
crossvalidation instead.

On Thu, Feb 16, 2012 at 10:44 AM, samuel-rosa
<alessandrosamuel at yahoo.com.br> wrote:

  
    
#
The train function in the caret package will do this. The trainControl function would use method ="repeatedcv" and repeats = 100.
On Feb 18, 2012, at 2:15 PM, Greg Snow <538280 at gmail.com> wrote:

            
3 days later
#
Dear Max and Greg

Thank you for your help. Unfortunately I was not able in getting what I need
using the functions you suggested. I believe it can be a result of my
inexperience with the packages caret and rms. Therefore, I provide more
information about my problem and wish you can again provide me some help.

I already have a single linear regression model fitted to my data (n = 150).
The coefficients of the parameters have been determined using ordinary least
squares. In the next step I want to use another data set (n = 174) to obtain
the validation statistics. For other multivariate linear regression models I
have been using the function lmCV() as follows:
where the k segments are selected randomly (I need to use a known seed).

CV$predicted gives me the predicted values of "a" as a function of "b" and
"c" for all the n = 174 observations in each of the 100 replications.
However, it does not work for single linear regression models.

I may have made a mistake, but I could not found any function such as
$predicted to get all the 100 predicted values for each of the 174
observations.

Hope you can help me once again. 

Best regards,


-----
Bc.Sc.Agri. Alessandro Samuel-Rosa
Postgraduate Program in Soil Science
Federal University of Santa Maria
Av. Roraima, n? 1000, Bairro Camobi, CEP 97105-970
Santa Maria, Rio Grande do Sul, Brazil
--
View this message in context: http://r.789695.n4.nabble.com/Repeated-cross-validation-for-a-lm-object-tp4394833p4411252.html
Sent from the R help mailing list archive at Nabble.com.
#
Dear Max

Thank you for your attention. The train function in the caret package realy
does what I need.

Best regards,

-----
Bc.Sc.Agri. Alessandro Samuel-Rosa
Postgraduate Program in Soil Science
Federal University of Santa Maria
Av. Roraima, n? 1000, Bairro Camobi, CEP 97105-970
Santa Maria, Rio Grande do Sul, Brazil
--
View this message in context: http://r.789695.n4.nabble.com/Repeated-cross-validation-for-a-lm-object-tp4394833p4411744.html
Sent from the R help mailing list archive at Nabble.com.