Bootstrapping help

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20060117/296d389c/attachment.pl
The first thing you are doing wrong is that you are not including a
copy of cs for us to see ;).

Based on what you have written, I speculate that cs does not use the
index correctly.  if so then a simple, although inefficient,
workaround is to rewrite cs:

cs <- function(dataframe, index) {
	dataframe <- dataframe[index,]
	...
}

Good luck, 

Andrew
Hello,
 I am new to using R and I am having problems get boot() to work properly.  Here is what I am trying to do:

 I have statistic called "cs".  cs takes a data matrix (154 x 5) and calculates 12 different scores for me.  cs outputs the data as a vector (12 x 1).  cs doesn't really use weights, per se, however I have included this as one of the 2 arguments cs can take.

 I try performing a bootstrap by issuing: 
 myout<-boot(data, cs,R=999) 
 I have tried other versions where I specify stype="w", etc...

 The problem I get is that the dataset does not seem to be resampled.  I end up with 999 replicates that have the exact same value of the output of cs.

 In the end I have something like

 Bootstrap Statistics :
          original        bias    std. error
 t1*   0.865122275  1.698641e-14           0
 t2*  -0.005248414 -9.627715e-17           0
 t3*  -0.052833740 -8.812395e-16           0
 t4*   0.807040121  1.287859e-14           0
 t5*   0.542082588 -9.103829e-15           0
 t6*  -0.018617838 -7.285839e-17           0
 t7*   0.006409704  1.422473e-16           0
 t8*   0.529874453  8.104628e-15           0
 t9*   0.074804390  2.359224e-16           0
 t10* -0.007153634  1.301043e-16           0
 t11* -0.018241243 -2.359224e-16           0
 t12*  0.049409513 -1.200429e-15           0

 Clearly the bootstrap is not working.  What am I doing wrong?

 Thanks,
 Ben

-------------------------------
Benjamin Ridenhour
School of Biological Sciences
Washigton State University
P.O. Box 644236
Pullman, WA 99164-4236
Phone (509)335-7218
-------------------------------- 
"Nothing in biology makes sense except in the light of evolution."
-T. Dobzhansky

	[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
Email: a.robinson at ms.unimelb.edu.au         http://www.ms.unimelb.edu.au
Hi, R users:

I have a data.frame (not a matrix), I got a vector with the same length 
as the
number of records (rows) of the data frame, and each element of
that vector is the column number (in a specific range of columns) of the 
corresponding
record that I must set to zero.

How can I  do this without a "for" loop?

Thank you for your help.

Kenneth
try:

DF2 <- as.data.frame(matrix(vec, nr=nrow(DF),nc=ncol(DF))==
            matrix(1:ncol(DF),nr=nrow(DF),nc=ncol(DF),byrow=T))

DF3 <- data.frame(mapply(function(z,x,y) { x[y] <- 0 ; x },
   names(DF), DF, DF2, SIMPLIFY=F))

but there must be an easier way...

Kenneth Cabrera a ??crit :
Hi, R users:

I have a data.frame (not a matrix), I got a vector with the same 
length as the
number of records (rows) of the data frame, and each element of
that vector is the column number (in a specific range of columns) of 
the corresponding
record that I must set to zero.

How can I  do this without a "for" loop?

Thank you for your help.

Kenneth

------------------------------------------------------------------------

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Hi

eg. your data frame has 35 rows and 6 columns

a<-sample(1:6, 35, replace=T)
b<-1:35
vec<-rep(0,35*6)
vec[a+6*(b-1)]<-1

This shall do the replacement
your.d.f[matrix(vec,35,6, byrow=T)==1] <- 0

But I am not sure if it is quicker than a loop.

HTH
Petr
Date sent:      	Wed, 18 Jan 2006 02:35:35 -0500
From:           	Kenneth Cabrera <krcabrer at epm.net.co>
To:             	r-help at stat.math.ethz.ch
Subject:        	[R] Data frame index?
Hi, R users:

I have a data.frame (not a matrix), I got a vector with the same
length as the number of records (rows) of the data frame, and each
element of that vector is the column number (in a specific range of
columns) of the corresponding record that I must set to zero.

How can I  do this without a "for" loop?

Thank you for your help.

Kenneth

Petr Pikal
petr.pikal at precheza.cz
you could try something like the following:

dat <- data.frame(matrix(rnorm(200), 20, 10))
index <- sample(10, 20, TRUE)
###############
mat.ind <- matrix(FALSE, nrow(dat), length(dat))
mat.ind[cbind(seq(along = index), index)] <- TRUE
dat[mat.ind] <- 0

index
dat

I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm

----- Original Message ----- 
From: "Kenneth Cabrera" <krcabrer at epm.net.co>
To: <r-help at stat.math.ethz.ch>
Sent: Wednesday, January 18, 2006 8:35 AM
Subject: [R] Data frame index?
Hi, R users:

I have a data.frame (not a matrix), I got a vector with the same 
length
as the
number of records (rows) of the data frame, and each element of
that vector is the column number (in a specific range of columns) of 
the
corresponding
record that I must set to zero.

How can I  do this without a "for" loop?

Thank you for your help.

Kenneth

--------------------------------------------------------------------------------
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html 
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
Hi, R users:

I have a data.frame (not a matrix), I got a vector with the same length 
as the
number of records (rows) of the data frame, and each element of
that vector is the column number (in a specific range of columns) of the 
corresponding
record that I must set to zero.

How can I  do this without a "for" loop?
It sounds as though you've found that you can use two-column matrix 
indexing on a data frame for reading but not assigning.  You create a 
matrix where the first column is the row number, and the second column 
is the column number.  Then indexing by that selects those particular 
elements in order.

For instance, if you have named your vector of columns "cols", you'd do

my.data.frame[ cbind(1:rows, cols) ] <- 0

Here's an example:

 > df
    x y
1  1 a
2  1 a
3  1 a
4  1 a
5  1 a
6  1 a
7  1 a
8  1 a
9  1 a
10 1 a
 > df[cbind(1:4,c(1,2,1,2))]
[1] "1" "a" "1" "a"

But

 > df[cbind(1:4,c(1,2,1,2))] <- 0
Error in "[<-.data.frame"(`*tmp*`, cbind(1:4, c(1, 2, 1, 2)), value = 0) :
         only logical matrix subscripts are allowed in replacement

To get around this, construct the logical matrix using this method, then 
  use it as an index:

 > mat <- matrix(FALSE, 10, 2)
 > mat[cbind(1:4,c(1,2,1,2))] <- TRUE
 > df[mat] <- 0
Warning message:
invalid factor level, NAs generated in: "[<-.factor"(`*tmp*`, thisvar, 
value = 0)
 > df
    x    y
1  0    a
2  1 <NA>
3  0    a
4  1 <NA>
5  1    a
6  1    a
7  1    a
8  1    a
9  1    a
10 1    a

If your columns are all numeric, you won't get the warning I got.

Duncan Murdoch
It's worth noting that there are quite a few for loops inside the code 
used by matrix indexing of data frames.

I think a single for-loop over the columns is as good as any, something 
like

DF <- data.frame(x=1, y=rep("a", 4), z = 3)
ind <- c(1,3,3,1) # only numeric cols
for(i in unique(ind)) DF[ind==i, i] <- 0
DF
   x y z
1 0 a 3
2 1 a 0
3 1 a 0
4 0 a 3

On 1/18/2006 2:35 AM, Kenneth Cabrera wrote:
Hi, R users:

I have a data.frame (not a matrix), I got a vector with the same length
as the
number of records (rows) of the data frame, and each element of
that vector is the column number (in a specific range of columns) of the
corresponding
record that I must set to zero.

How can I  do this without a "for" loop?
It sounds as though you've found that you can use two-column matrix
indexing on a data frame for reading but not assigning.  You create a
matrix where the first column is the row number, and the second column
is the column number.  Then indexing by that selects those particular
elements in order.

For instance, if you have named your vector of columns "cols", you'd do

my.data.frame[ cbind(1:rows, cols) ] <- 0

Here's an example:

df
   x y
1  1 a
2  1 a
3  1 a
4  1 a
5  1 a
6  1 a
7  1 a
8  1 a
9  1 a
10 1 a
df[cbind(1:4,c(1,2,1,2))]
[1] "1" "a" "1" "a"

But

df[cbind(1:4,c(1,2,1,2))] <- 0
Error in "[<-.data.frame"(`*tmp*`, cbind(1:4, c(1, 2, 1, 2)), value = 0) :
        only logical matrix subscripts are allowed in replacement

To get around this, construct the logical matrix using this method, then
 use it as an index:

mat <- matrix(FALSE, 10, 2)
mat[cbind(1:4,c(1,2,1,2))] <- TRUE
df[mat] <- 0
Warning message:
invalid factor level, NAs generated in: "[<-.factor"(`*tmp*`, thisvar,
value = 0)
df
   x    y
1  0    a
2  1 <NA>
3  0    a
4  1 <NA>
5  1    a
6  1    a
7  1    a
8  1    a
9  1    a
10 1    a

If your columns are all numeric, you won't get the warning I got.

Duncan Murdoch

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20060118/d4ea138a/attachment.pl
Ben,

although I appended a smiley to my first note, the message was
serious.  If you don't show us what you're doing, we can't help you.
Please provide an example in which you:

1) generate a small dataframe similar in structure to yours
2) provide cs
3) show the boot statement that applies cs to the example dataframe.

Also, it seems that you are unfamiliar with the use of indexing and
datframes.  Please read the Introduction to R, carefully, it is freely
available on CRAN.  You have asked R to provide you with all the rows
that are numbered 1/sample size.R; since the row numbers are integers
there aren't any.

And, please say hello to Andrew Storfer and Melanie Murphy from me.

Andrew
   Andrew,
   Thanks for the suggestion!  This seems to have fixed things.  I was
   wondering if you could explain why this works and what was wrong.  If
   I issue the command
   >my.boot<-boot(dataframe,cs,R=999)
   and in order to what effect the command you told me use has I then do
   something like
   >dataframe[my.boot$weights,]
   my.boot$weights looks to be a vector where element is 1/sample size.R
   reports that
   [1] var1    var2   var3   var4 var5
   <0 rows> (or 0-length row.names)
    Which indicates to me that I then have a dataframe with no data in
   it! (Am I wrong about this?)  What is going on here?  Why did this
   work?  Sorry for the basic questions.
   Ben

Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
Email: a.robinson at ms.unimelb.edu.au         http://www.ms.unimelb.edu.au
An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20060118/5492e391/attachment.pl
Ben,

Ok, it's clear now, thanks.  Note that your boot call 

boot(mydata,cs,R=999)

does not specify an "stype" argument.  The boot help file notes that
the default value for stype is "i", which means that boot will pass an
index to the function, not a weight, regardless of whether you call it
w, i, or whatever. 

The index that boot sends to the function is then used to index the
dataframe, thus selecting rows randomly with replacement.  Previously
you passed the dataframe to the function, which did not alter it, so
it passed through undisturbed.  In this incarnation the data<-data[w,]
command provides you with the (pseudo-)random sample with replacement
of the data.

I hope that this clears up the confusion.

Cheers,

Andrew

ps it's always good to provide a brief bit of sample code when you ask
a question.  Also, let me recommend that you omit semi-colons and
space the code to make it easier to read.  Thus

cs <- function(data, w) { 
     data<-data[w, ] 
     ...
   Thanks for responding :) Again...
   I understand how indexing works (basically as in any other programming
   language), that is why I am so confused as to why that statement made
   my bootstraping work!  It seems like, if anything, it would completely
   screw up everything.
   Here is the (now working) cs function after I amended it to what you
   said to do (i.e. I added the _very confusing_ statement
   data<-data[w,]):
   >cs<-function(data, w){
   data<-data[w,];
   y<-data[1];
   x1<-data[2];
   x2<-data[3];
   x3<-data[4];
   z<-data[5];
   c1<-x1*z;
   c2<-x2*z;
   c3<-x3*z;
   X<-cbind(x1,x2,x3,z,c1,c2,c3);
   regcoef<-lsfit(X,y)$coefficients;
   bx1<-regcoef[[2]];
   bx2<-regcoef[[3]];
   bx3<-regcoef[[4]];
   bz<-regcoef[[5]];
   bc1<-regcoef[[6]];
   bc2<-regcoef[[7]];
   bc3<-regcoef[[8]];
   fx<-bx1*x1+bx2*x2+bx3*x3;
   gy<-bz*z;
   hxy<-bc1*c1+bc2*c2+bc3*c3;
   sfx1<-cov(fx,x1);
   sfx2<-cov(fx,x2);
   sfx3<-cov(fx,x3);
   sgx1<-cov(gy,x1);
   sgx2<-cov(gy,x2);
   sgx3<-cov(gy,x3);
   shx1<-cov(hxy,x1);
   shx2<-cov(hxy,x2);
   shx3<-cov(hxy,x3);
   sTx1<-cov(y,x1);
   sTx2<-cov(y,x2);
   sTx3<-cov(y,x3);
   dataout<-c(sfx1,sgx1,shx1,sTx1,sfx2,sgx2,shx2,sTx2,sfx3,sgx3,shx3,sTx3
   );
   dataout
   }
   An example data frame would be
    >mydata<-data.frame(Y=rnorm(20,0,1),X1=rnorm(20,0,1),X2=rnorm(20,0,1)
   ,X3=rnorm(20,0,1),Z=rnorm(20,0,1))
   The  boot statement is
   >boot(mydata,cs,R=999)
   Why does this rather mysterious indexing statement "data<-data[w,]"
   make the bootstrap work when it didn't beforehand?
   Thanks,
   Ben
   ps. I'll tell Melanie and Storfer hello.
   -------------------------------
   Benjamin Ridenhour
   School of Biological Sciences
   Washigton State University
   P.O. Box 644236
   Pullman, WA 99164-4236
   Phone (509)335-7218
   --------------------------------
   "Nothing in biology makes sense except in the light of evolution."
   -T. Dobzhansky
   http://www.ms.unimelb.edu.au

Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
Email: a.robinson at ms.unimelb.edu.au         http://www.ms.unimelb.edu.au