Skip to content

Bootstrapping help

12 messages · Ben Ridenhour, Kenneth Cabrera, Jacques VESLOT +5 more

#
The first thing you are doing wrong is that you are not including a
copy of cs for us to see ;).

Based on what you have written, I speculate that cs does not use the
index correctly.  if so then a simple, although inefficient,
workaround is to rewrite cs:

cs <- function(dataframe, index) {
	dataframe <- dataframe[index,]
	...
}

Good luck, 

Andrew
On Tue, Jan 17, 2006 at 10:02:49PM -0800, Ben Ridenhour wrote:

  
    
#
Hi, R users:

I have a data.frame (not a matrix), I got a vector with the same length 
as the
number of records (rows) of the data frame, and each element of
that vector is the column number (in a specific range of columns) of the 
corresponding
record that I must set to zero.

How can I  do this without a "for" loop?

Thank you for your help.

Kenneth
#
try:

DF2 <- as.data.frame(matrix(vec, nr=nrow(DF),nc=ncol(DF))==
            matrix(1:ncol(DF),nr=nrow(DF),nc=ncol(DF),byrow=T))

DF3 <- data.frame(mapply(function(z,x,y) { x[y] <- 0 ; x },
   names(DF), DF, DF2, SIMPLIFY=F))

but there must be an easier way...


Kenneth Cabrera a ??crit :
#
Hi

eg. your data frame has 35 rows and 6 columns

a<-sample(1:6, 35, replace=T)
b<-1:35
vec<-rep(0,35*6)
vec[a+6*(b-1)]<-1

This shall do the replacement
your.d.f[matrix(vec,35,6, byrow=T)==1] <- 0

But I am not sure if it is quicker than a loop.

HTH
Petr
On 18 Jan 2006 at 2:35, Kenneth Cabrera wrote:
Date sent:      	Wed, 18 Jan 2006 02:35:35 -0500
From:           	Kenneth Cabrera <krcabrer at epm.net.co>
To:             	r-help at stat.math.ethz.ch
Subject:        	[R] Data frame index?
Petr Pikal
petr.pikal at precheza.cz
#
you could try something like the following:

dat <- data.frame(matrix(rnorm(200), 20, 10))
index <- sample(10, 20, TRUE)
###############
mat.ind <- matrix(FALSE, nrow(dat), length(dat))
mat.ind[cbind(seq(along = index), index)] <- TRUE
dat[mat.ind] <- 0

index
dat


I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: "Kenneth Cabrera" <krcabrer at epm.net.co>
To: <r-help at stat.math.ethz.ch>
Sent: Wednesday, January 18, 2006 8:35 AM
Subject: [R] Data frame index?
--------------------------------------------------------------------------------
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
#
On 1/18/2006 2:35 AM, Kenneth Cabrera wrote:
It sounds as though you've found that you can use two-column matrix 
indexing on a data frame for reading but not assigning.  You create a 
matrix where the first column is the row number, and the second column 
is the column number.  Then indexing by that selects those particular 
elements in order.

For instance, if you have named your vector of columns "cols", you'd do

my.data.frame[ cbind(1:rows, cols) ] <- 0

Here's an example:

 > df
    x y
1  1 a
2  1 a
3  1 a
4  1 a
5  1 a
6  1 a
7  1 a
8  1 a
9  1 a
10 1 a
 > df[cbind(1:4,c(1,2,1,2))]
[1] "1" "a" "1" "a"

But

 > df[cbind(1:4,c(1,2,1,2))] <- 0
Error in "[<-.data.frame"(`*tmp*`, cbind(1:4, c(1, 2, 1, 2)), value = 0) :
         only logical matrix subscripts are allowed in replacement

To get around this, construct the logical matrix using this method, then 
  use it as an index:

 > mat <- matrix(FALSE, 10, 2)
 > mat[cbind(1:4,c(1,2,1,2))] <- TRUE
 > df[mat] <- 0
Warning message:
invalid factor level, NAs generated in: "[<-.factor"(`*tmp*`, thisvar, 
value = 0)
 > df
    x    y
1  0    a
2  1 <NA>
3  0    a
4  1 <NA>
5  1    a
6  1    a
7  1    a
8  1    a
9  1    a
10 1    a

If your columns are all numeric, you won't get the warning I got.

Duncan Murdoch
#
It's worth noting that there are quite a few for loops inside the code 
used by matrix indexing of data frames.

I think a single for-loop over the columns is as good as any, something 
like

DF <- data.frame(x=1, y=rep("a", 4), z = 3)
ind <- c(1,3,3,1) # only numeric cols
for(i in unique(ind)) DF[ind==i, i] <- 0
DF
   x y z
1 0 a 3
2 1 a 0
3 1 a 0
4 0 a 3
On Wed, 18 Jan 2006, Duncan Murdoch wrote:

            

  
    
#
Ben,

although I appended a smiley to my first note, the message was
serious.  If you don't show us what you're doing, we can't help you.
Please provide an example in which you:

1) generate a small dataframe similar in structure to yours
2) provide cs
3) show the boot statement that applies cs to the example dataframe.

Also, it seems that you are unfamiliar with the use of indexing and
datframes.  Please read the Introduction to R, carefully, it is freely
available on CRAN.  You have asked R to provide you with all the rows
that are numbered 1/sample size.R; since the row numbers are integers
there aren't any.

And, please say hello to Andrew Storfer and Melanie Murphy from me.

Andrew
On Wed, Jan 18, 2006 at 11:39:36AM -0800, Ben Ridenhour wrote:

  
    
#
Ben,

Ok, it's clear now, thanks.  Note that your boot call 

boot(mydata,cs,R=999)

does not specify an "stype" argument.  The boot help file notes that
the default value for stype is "i", which means that boot will pass an
index to the function, not a weight, regardless of whether you call it
w, i, or whatever. 

The index that boot sends to the function is then used to index the
dataframe, thus selecting rows randomly with replacement.  Previously
you passed the dataframe to the function, which did not alter it, so
it passed through undisturbed.  In this incarnation the data<-data[w,]
command provides you with the (pseudo-)random sample with replacement
of the data.

I hope that this clears up the confusion.

Cheers,

Andrew

ps it's always good to provide a brief bit of sample code when you ask
a question.  Also, let me recommend that you omit semi-colons and
space the code to make it easier to read.  Thus

cs <- function(data, w) { 
     data<-data[w, ] 
     ...
On Wed, Jan 18, 2006 at 04:35:47PM -0800, Ben Ridenhour wrote: