Skip to content

bootstrapping - number of items to replace is not a multiple of replacement length

3 messages · Gabriela Bucini, Steven McKinney

#
Hello,

I'm new to boostrapping and I'd need some help to understand the error
message that pops up when I run my script.

I have a data.frame with 73 lines and 21 column.
I am running a stepwise regression to find the best model using the R
function "step".
I apply bootstrapping to obtain model coefficients.
This is my script:

# "datare80" is the name of the data.frame and "woodycover" is the response
variable
theta <- function(datare80, indices) {
             d <- datare80[indices, ]    # allows boot to select subsample
datasets
             full <- lm(d$woodycover~ ., data= d )
             lmbroadst <- step(full, data=d , direction = "both", k=2,
trace=0)
             coefficients(lmbroadst)  # return coef. vector
          }
resb <- boot(data = datare80, statistic = theta, R=1000)



When I run it, I get these two messages:
If I omit the last line "coefficients(lmbroadst)" in the function
definition, I get :
"Error in t.star[r, ] <- statistic(data, i[r, ], ...) :
        incorrect number of subscripts on matrix"
If I have the last line "coefficients(lmbroadst)", then I get:
"Error in t.star[r, ] <- statistic(data, i[r, ], ...) :
        number of items to replace is not a multiple of replacement length"


Thank you very much for any help!

Gabriela
#
Hello,

Your theta() function is returning different
sets of coefficients depending on the results of
step().

You'll need to add code to theta() to figure
out which variables were selected, and store
them into the right positions of a vector
of length 20 (the apparent number of covariates
you describe below), so that your theta()
function always returns the same sized output.  

(Google

   stepwise regression random forest

and you'll get a number of hits
about using random forests instead
of stepwise regression, and pointers
about bootstrapping the random forest.)

HTH

Steve McKinney
[mailto:r-help-bounces at r-project.org]
multiple
subsample
#
HI Steve,

I've checked the random forest package. The VariableImportance does exactly
what I need.
Thank you so much!

Gabriela