An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20060117/296d389c/attachment.pl
Bootstrapping help
12 messages · Ben Ridenhour, Kenneth Cabrera, Jacques VESLOT +5 more
The first thing you are doing wrong is that you are not including a
copy of cs for us to see ;).
Based on what you have written, I speculate that cs does not use the
index correctly. if so then a simple, although inefficient,
workaround is to rewrite cs:
cs <- function(dataframe, index) {
dataframe <- dataframe[index,]
...
}
Good luck,
Andrew
On Tue, Jan 17, 2006 at 10:02:49PM -0800, Ben Ridenhour wrote:
Hello,
I am new to using R and I am having problems get boot() to work properly. Here is what I am trying to do:
I have statistic called "cs". cs takes a data matrix (154 x 5) and calculates 12 different scores for me. cs outputs the data as a vector (12 x 1). cs doesn't really use weights, per se, however I have included this as one of the 2 arguments cs can take.
I try performing a bootstrap by issuing:
myout<-boot(data, cs,R=999)
I have tried other versions where I specify stype="w", etc...
The problem I get is that the dataset does not seem to be resampled. I end up with 999 replicates that have the exact same value of the output of cs.
In the end I have something like
Bootstrap Statistics :
original bias std. error
t1* 0.865122275 1.698641e-14 0
t2* -0.005248414 -9.627715e-17 0
t3* -0.052833740 -8.812395e-16 0
t4* 0.807040121 1.287859e-14 0
t5* 0.542082588 -9.103829e-15 0
t6* -0.018617838 -7.285839e-17 0
t7* 0.006409704 1.422473e-16 0
t8* 0.529874453 8.104628e-15 0
t9* 0.074804390 2.359224e-16 0
t10* -0.007153634 1.301043e-16 0
t11* -0.018241243 -2.359224e-16 0
t12* 0.049409513 -1.200429e-15 0
Clearly the bootstrap is not working. What am I doing wrong?
Thanks,
Ben
-------------------------------
Benjamin Ridenhour
School of Biological Sciences
Washigton State University
P.O. Box 644236
Pullman, WA 99164-4236
Phone (509)335-7218
--------------------------------
"Nothing in biology makes sense except in the light of evolution."
-T. Dobzhansky
[[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Andrew Robinson Department of Mathematics and Statistics Tel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 Email: a.robinson at ms.unimelb.edu.au http://www.ms.unimelb.edu.au
Hi, R users: I have a data.frame (not a matrix), I got a vector with the same length as the number of records (rows) of the data frame, and each element of that vector is the column number (in a specific range of columns) of the corresponding record that I must set to zero. How can I do this without a "for" loop? Thank you for your help. Kenneth
try:
DF2 <- as.data.frame(matrix(vec, nr=nrow(DF),nc=ncol(DF))==
matrix(1:ncol(DF),nr=nrow(DF),nc=ncol(DF),byrow=T))
DF3 <- data.frame(mapply(function(z,x,y) { x[y] <- 0 ; x },
names(DF), DF, DF2, SIMPLIFY=F))
but there must be an easier way...
Kenneth Cabrera a ??crit :
Hi, R users: I have a data.frame (not a matrix), I got a vector with the same length as the number of records (rows) of the data frame, and each element of that vector is the column number (in a specific range of columns) of the corresponding record that I must set to zero. How can I do this without a "for" loop? Thank you for your help. Kenneth ------------------------------------------------------------------------
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Hi eg. your data frame has 35 rows and 6 columns a<-sample(1:6, 35, replace=T) b<-1:35 vec<-rep(0,35*6) vec[a+6*(b-1)]<-1 This shall do the replacement your.d.f[matrix(vec,35,6, byrow=T)==1] <- 0 But I am not sure if it is quicker than a loop. HTH Petr
On 18 Jan 2006 at 2:35, Kenneth Cabrera wrote:
Date sent: Wed, 18 Jan 2006 02:35:35 -0500 From: Kenneth Cabrera <krcabrer at epm.net.co> To: r-help at stat.math.ethz.ch Subject: [R] Data frame index?
Hi, R users: I have a data.frame (not a matrix), I got a vector with the same length as the number of records (rows) of the data frame, and each element of that vector is the column number (in a specific range of columns) of the corresponding record that I must set to zero. How can I do this without a "for" loop? Thank you for your help. Kenneth
Petr Pikal petr.pikal at precheza.cz
you could try something like the following: dat <- data.frame(matrix(rnorm(200), 20, 10)) index <- sample(10, 20, TRUE) ############### mat.ind <- matrix(FALSE, nrow(dat), length(dat)) mat.ind[cbind(seq(along = index), index)] <- TRUE dat[mat.ind] <- 0 index dat I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://www.med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Kenneth Cabrera" <krcabrer at epm.net.co> To: <r-help at stat.math.ethz.ch> Sent: Wednesday, January 18, 2006 8:35 AM Subject: [R] Data frame index?
Hi, R users: I have a data.frame (not a matrix), I got a vector with the same length as the number of records (rows) of the data frame, and each element of that vector is the column number (in a specific range of columns) of the corresponding record that I must set to zero. How can I do this without a "for" loop? Thank you for your help. Kenneth
--------------------------------------------------------------------------------
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
On 1/18/2006 2:35 AM, Kenneth Cabrera wrote:
Hi, R users: I have a data.frame (not a matrix), I got a vector with the same length as the number of records (rows) of the data frame, and each element of that vector is the column number (in a specific range of columns) of the corresponding record that I must set to zero. How can I do this without a "for" loop?
It sounds as though you've found that you can use two-column matrix
indexing on a data frame for reading but not assigning. You create a
matrix where the first column is the row number, and the second column
is the column number. Then indexing by that selects those particular
elements in order.
For instance, if you have named your vector of columns "cols", you'd do
my.data.frame[ cbind(1:rows, cols) ] <- 0
Here's an example:
> df
x y
1 1 a
2 1 a
3 1 a
4 1 a
5 1 a
6 1 a
7 1 a
8 1 a
9 1 a
10 1 a
> df[cbind(1:4,c(1,2,1,2))]
[1] "1" "a" "1" "a"
But
> df[cbind(1:4,c(1,2,1,2))] <- 0
Error in "[<-.data.frame"(`*tmp*`, cbind(1:4, c(1, 2, 1, 2)), value = 0) :
only logical matrix subscripts are allowed in replacement
To get around this, construct the logical matrix using this method, then
use it as an index:
> mat <- matrix(FALSE, 10, 2)
> mat[cbind(1:4,c(1,2,1,2))] <- TRUE
> df[mat] <- 0
Warning message:
invalid factor level, NAs generated in: "[<-.factor"(`*tmp*`, thisvar,
value = 0)
> df
x y
1 0 a
2 1 <NA>
3 0 a
4 1 <NA>
5 1 a
6 1 a
7 1 a
8 1 a
9 1 a
10 1 a
If your columns are all numeric, you won't get the warning I got.
Duncan Murdoch
It's worth noting that there are quite a few for loops inside the code
used by matrix indexing of data frames.
I think a single for-loop over the columns is as good as any, something
like
DF <- data.frame(x=1, y=rep("a", 4), z = 3)
ind <- c(1,3,3,1) # only numeric cols
for(i in unique(ind)) DF[ind==i, i] <- 0
DF
x y z
1 0 a 3
2 1 a 0
3 1 a 0
4 0 a 3
On Wed, 18 Jan 2006, Duncan Murdoch wrote:
On 1/18/2006 2:35 AM, Kenneth Cabrera wrote:
Hi, R users: I have a data.frame (not a matrix), I got a vector with the same length as the number of records (rows) of the data frame, and each element of that vector is the column number (in a specific range of columns) of the corresponding record that I must set to zero. How can I do this without a "for" loop?
It sounds as though you've found that you can use two-column matrix indexing on a data frame for reading but not assigning. You create a matrix where the first column is the row number, and the second column is the column number. Then indexing by that selects those particular elements in order. For instance, if you have named your vector of columns "cols", you'd do my.data.frame[ cbind(1:rows, cols) ] <- 0 Here's an example:
df
x y 1 1 a 2 1 a 3 1 a 4 1 a 5 1 a 6 1 a 7 1 a 8 1 a 9 1 a 10 1 a
df[cbind(1:4,c(1,2,1,2))]
[1] "1" "a" "1" "a" But
df[cbind(1:4,c(1,2,1,2))] <- 0
Error in "[<-.data.frame"(`*tmp*`, cbind(1:4, c(1, 2, 1, 2)), value = 0) :
only logical matrix subscripts are allowed in replacement
To get around this, construct the logical matrix using this method, then
use it as an index:
mat <- matrix(FALSE, 10, 2) mat[cbind(1:4,c(1,2,1,2))] <- TRUE df[mat] <- 0
Warning message: invalid factor level, NAs generated in: "[<-.factor"(`*tmp*`, thisvar, value = 0)
df
x y 1 0 a 2 1 <NA> 3 0 a 4 1 <NA> 5 1 a 6 1 a 7 1 a 8 1 a 9 1 a 10 1 a If your columns are all numeric, you won't get the warning I got. Duncan Murdoch
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20060118/d4ea138a/attachment.pl
Ben, although I appended a smiley to my first note, the message was serious. If you don't show us what you're doing, we can't help you. Please provide an example in which you: 1) generate a small dataframe similar in structure to yours 2) provide cs 3) show the boot statement that applies cs to the example dataframe. Also, it seems that you are unfamiliar with the use of indexing and datframes. Please read the Introduction to R, carefully, it is freely available on CRAN. You have asked R to provide you with all the rows that are numbered 1/sample size.R; since the row numbers are integers there aren't any. And, please say hello to Andrew Storfer and Melanie Murphy from me. Andrew
On Wed, Jan 18, 2006 at 11:39:36AM -0800, Ben Ridenhour wrote:
Andrew, Thanks for the suggestion! This seems to have fixed things. I was wondering if you could explain why this works and what was wrong. If I issue the command
>my.boot<-boot(dataframe,cs,R=999)
and in order to what effect the command you told me use has I then do something like
>dataframe[my.boot$weights,]
my.boot$weights looks to be a vector where element is 1/sample size.R
reports that
[1] var1 var2 var3 var4 var5
<0 rows> (or 0-length row.names)
Which indicates to me that I then have a dataframe with no data in
it! (Am I wrong about this?) What is going on here? Why did this
work? Sorry for the basic questions.
Ben
Andrew Robinson Department of Mathematics and Statistics Tel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 Email: a.robinson at ms.unimelb.edu.au http://www.ms.unimelb.edu.au
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20060118/5492e391/attachment.pl
Ben,
Ok, it's clear now, thanks. Note that your boot call
boot(mydata,cs,R=999)
does not specify an "stype" argument. The boot help file notes that
the default value for stype is "i", which means that boot will pass an
index to the function, not a weight, regardless of whether you call it
w, i, or whatever.
The index that boot sends to the function is then used to index the
dataframe, thus selecting rows randomly with replacement. Previously
you passed the dataframe to the function, which did not alter it, so
it passed through undisturbed. In this incarnation the data<-data[w,]
command provides you with the (pseudo-)random sample with replacement
of the data.
I hope that this clears up the confusion.
Cheers,
Andrew
ps it's always good to provide a brief bit of sample code when you ask
a question. Also, let me recommend that you omit semi-colons and
space the code to make it easier to read. Thus
cs <- function(data, w) {
data<-data[w, ]
...
On Wed, Jan 18, 2006 at 04:35:47PM -0800, Ben Ridenhour wrote:
Thanks for responding :) Again... I understand how indexing works (basically as in any other programming language), that is why I am so confused as to why that statement made my bootstraping work! It seems like, if anything, it would completely screw up everything. Here is the (now working) cs function after I amended it to what you said to do (i.e. I added the _very confusing_ statement data<-data[w,]):
>cs<-function(data, w){
data<-data[w,]; y<-data[1]; x1<-data[2]; x2<-data[3]; x3<-data[4]; z<-data[5]; c1<-x1*z; c2<-x2*z; c3<-x3*z; X<-cbind(x1,x2,x3,z,c1,c2,c3); regcoef<-lsfit(X,y)$coefficients; bx1<-regcoef[[2]]; bx2<-regcoef[[3]]; bx3<-regcoef[[4]]; bz<-regcoef[[5]]; bc1<-regcoef[[6]]; bc2<-regcoef[[7]]; bc3<-regcoef[[8]]; fx<-bx1*x1+bx2*x2+bx3*x3; gy<-bz*z; hxy<-bc1*c1+bc2*c2+bc3*c3; sfx1<-cov(fx,x1); sfx2<-cov(fx,x2); sfx3<-cov(fx,x3); sgx1<-cov(gy,x1); sgx2<-cov(gy,x2); sgx3<-cov(gy,x3); shx1<-cov(hxy,x1); shx2<-cov(hxy,x2); shx3<-cov(hxy,x3); sTx1<-cov(y,x1); sTx2<-cov(y,x2); sTx3<-cov(y,x3); dataout<-c(sfx1,sgx1,shx1,sTx1,sfx2,sgx2,shx2,sTx2,sfx3,sgx3,shx3,sTx3 ); dataout } An example data frame would be
>mydata<-data.frame(Y=rnorm(20,0,1),X1=rnorm(20,0,1),X2=rnorm(20,0,1)
,X3=rnorm(20,0,1),Z=rnorm(20,0,1)) The boot statement is
>boot(mydata,cs,R=999)
Why does this rather mysterious indexing statement "data<-data[w,]" make the bootstrap work when it didn't beforehand? Thanks, Ben ps. I'll tell Melanie and Storfer hello. ------------------------------- Benjamin Ridenhour School of Biological Sciences Washigton State University P.O. Box 644236 Pullman, WA 99164-4236 Phone (509)335-7218 -------------------------------- "Nothing in biology makes sense except in the light of evolution." -T. Dobzhansky http://www.ms.unimelb.edu.au
Andrew Robinson Department of Mathematics and Statistics Tel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 Email: a.robinson at ms.unimelb.edu.au http://www.ms.unimelb.edu.au