An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080903/78892af1/attachment.pl>
restricted bootstrap
5 messages · Grant Gillis, Brian Ripley, ONKELINX, Thierry
I see nothing here to do with the 'bootstrap', which is sampling with replacement. Do you know what you mean exactly by 'randomly sample'? In general the way to so this is to sample randomly (uniformly, whatever) and reject samples that do not meet your restriction. For some restrictions there are more efficient algorithms, but I don't understand yours. (What are the 'rows'? Do you want to sample rows in space or xy locations? How come 'dist' is not symmetric?) For some restrictions, an MCMC sampling scheme is needed, the hard-core spatial point process being a related example.
On Wed, 3 Sep 2008, Grant Gillis wrote:
Hello List,
I am not sure that I have the correct terminology here (restricted
bootstrap) which may be hampering my archive searches. I have quite a large
spatially autocorrelated data set. I have xy coordinates and the
corresponding pairwise distance matrix (metres) for each row. I would like
to randomly sample some number of rows but restricting samples such that the
distance between them is larger than the threshold of autocorrelation. I
have been been unsuccessfully trying to link the 'sample' function to values
in the distance matrix.
My end goal is to randomly sample M thousand rows of data N thousand times
calculating linear regression coefficients for each sample but am stuck on
taking the initial sample. I believe I can figure out the rest.
Example Question
I would like to radomly sample 3 rows further but withe the restriction that
they are greater than 100m apart
example data:
main data:
y<- c(1, 2, 9, 5, 6)
x<-c( 1, 3, 5, 7, 9)
z<-c(2, 4, 6, 8, 10)
a<-c(3, 9, 6, 4 ,4)
maindata<-cbind(y, x, z, a)
y x x a
[1,] 1 1 1 3
[2,] 2 3 3 9
[3,] 9 5 5 6
[4,] 5 7 7 4
[5,] 6 9 9 4
distance matrix:
row1<-c(0, 123, 567, 89)
row2<-c(98, 0, 345, 543)
row3<-c(765, 90, 0, 987)
row4<-c(654, 8, 99, 0)
dist<-rbind(row1, row2, row3, row4)
[,1] [,2] [,3] [,4]
row1 0 123 567 89
row2 98 0 345 543
row3 765 90 0 987
row4 654 8 99 0
Thanks for all of the help in the past and now
Cheers
Grant
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080904/ab719735/attachment.pl>
Grant, Have you considered a gls model instead of a lm model? In a gls model one can model the correlation between the measures. So you won't need to select a subset of your data. You can kind gls in the nlme package. HTH, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx op inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-help-bounces op r-project.org [mailto:r-help-bounces op r-project.org] Namens Grant Gillis Verzonden: donderdag 4 september 2008 14:57 Aan: r-help op r-project.org Onderwerp: Re: [R] restricted bootstrap Hello Professor Ripely, Sorry for not being clear. I posted after a long day of struggling. Also my toy distance matrix should have been symmetrical. Simply put I have spatially autocorrelated data collected from many points. I would like to do a linear regression on these data. To deal with the autocrrelation I want to resample a subset of my data with replacement but I need to restrict subsets such that no two locations where data was collected are closer than Xm apart (further apart than the autocrrelation in the data). Thanks for having a look at this for me. I will look up the hard-core spatial point process. Grant 2008/9/4 Prof Brian Ripley <ripley op stats.ox.ac.uk>
I see nothing here to do with the 'bootstrap', which is sampling with replacement. Do you know what you mean exactly by 'randomly sample'? In general
the way
to so this is to sample randomly (uniformly, whatever) and reject
samples
that do not meet your restriction. For some restrictions there are
more
efficient algorithms, but I don't understand yours. (What are the
'rows'?
Do you want to sample rows in space or xy locations? How come 'dist'
is
not symmetric?) For some restrictions, an MCMC sampling scheme is
needed,
the hard-core spatial point process being a related example. On Wed, 3 Sep 2008, Grant Gillis wrote: Hello List,
I am not sure that I have the correct terminology here (restricted bootstrap) which may be hampering my archive searches. I have quite
a
large spatially autocorrelated data set. I have xy coordinates and the corresponding pairwise distance matrix (metres) for each row. I
would
like to randomly sample some number of rows but restricting samples such
that
the distance between them is larger than the threshold of
autocorrelation. I
have been been unsuccessfully trying to link the 'sample' function to values in the distance matrix. My end goal is to randomly sample M thousand rows of data N thousand
times
calculating linear regression coefficients for each sample but am
stuck on
taking the initial sample. I believe I can figure out the rest. Example Question I would like to radomly sample 3 rows further but withe the
restriction
that
they are greater than 100m apart
example data:
main data:
y<- c(1, 2, 9, 5, 6)
x<-c( 1, 3, 5, 7, 9)
z<-c(2, 4, 6, 8, 10)
a<-c(3, 9, 6, 4 ,4)
maindata<-cbind(y, x, z, a)
y x x a
[1,] 1 1 1 3
[2,] 2 3 3 9
[3,] 9 5 5 6
[4,] 5 7 7 4
[5,] 6 9 9 4
distance matrix:
row1<-c(0, 123, 567, 89)
row2<-c(98, 0, 345, 543)
row3<-c(765, 90, 0, 987)
row4<-c(654, 8, 99, 0)
dist<-rbind(row1, row2, row3, row4)
[,1] [,2] [,3] [,4]
row1 0 123 567 89
row2 98 0 345 543
row3 765 90 0 987
row4 654 8 99 0
Thanks for all of the help in the past and now
Cheers
Grant
[[alternative HTML version deleted]]
______________________________________________ R-help op r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Brian D. Ripley, ripley op stats.ox.ac.uk Professor of Applied Statistics,
http://www.stats.ox.ac.uk/~ripley/<http://www.stats.ox.ac.uk/%7Eripley/>
University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________ R-help op r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.%CRLF%The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document%CRLF%
1 day later
On Thu, 4 Sep 2008, Grant Gillis wrote:
Hello Professor Ripely, Sorry for not being clear. I posted after a long day of struggling. Also my toy distance matrix should have been symmetrical. Simply put I have spatially autocorrelated data collected from many points. I would like to do a linear regression on these data. To deal with the autocrrelation I want to resample a subset of my data with replacement but I need to restrict subsets such that no two locations where data was collected are closer than Xm apart (further apart than the autocrrelation in the data).
That is impossible. Resampling with replacement will give duplicated locations (with a very high probability) and those have distance zero. If you want a subsample (necessarily without replacement) you have a hard-core point process on a discrete set. It's possible that the MCMC methods we used for Strauss processes can be made to work in that case, but it is also possible that the state space is reducible and so more elaborate algorithms are needed. I do think it would be much easier to take autocorrelation into account in your linear model fit. There are many ways to do that, e.g. MASS::lm.gls, and in fact uless the correlations are very high OLS is likely to be quite efficient (but you need to use e.g. a sandwich estimator to get reliable standard errors).
Thanks for having a look at this for me. I will look up the hard-core spatial point process. Grant 2008/9/4 Prof Brian Ripley <ripley at stats.ox.ac.uk>
I see nothing here to do with the 'bootstrap', which is sampling with replacement. Do you know what you mean exactly by 'randomly sample'? In general the way to so this is to sample randomly (uniformly, whatever) and reject samples that do not meet your restriction. For some restrictions there are more efficient algorithms, but I don't understand yours. (What are the 'rows'? Do you want to sample rows in space or xy locations? How come 'dist' is not symmetric?) For some restrictions, an MCMC sampling scheme is needed, the hard-core spatial point process being a related example. On Wed, 3 Sep 2008, Grant Gillis wrote: Hello List,
I am not sure that I have the correct terminology here (restricted
bootstrap) which may be hampering my archive searches. I have quite a
large
spatially autocorrelated data set. I have xy coordinates and the
corresponding pairwise distance matrix (metres) for each row. I would
like
to randomly sample some number of rows but restricting samples such that
the
distance between them is larger than the threshold of autocorrelation. I
have been been unsuccessfully trying to link the 'sample' function to
values
in the distance matrix.
My end goal is to randomly sample M thousand rows of data N thousand times
calculating linear regression coefficients for each sample but am stuck on
taking the initial sample. I believe I can figure out the rest.
Example Question
I would like to radomly sample 3 rows further but withe the restriction
that
they are greater than 100m apart
example data:
main data:
y<- c(1, 2, 9, 5, 6)
x<-c( 1, 3, 5, 7, 9)
z<-c(2, 4, 6, 8, 10)
a<-c(3, 9, 6, 4 ,4)
maindata<-cbind(y, x, z, a)
y x x a
[1,] 1 1 1 3
[2,] 2 3 3 9
[3,] 9 5 5 6
[4,] 5 7 7 4
[5,] 6 9 9 4
distance matrix:
row1<-c(0, 123, 567, 89)
row2<-c(98, 0, 345, 543)
row3<-c(765, 90, 0, 987)
row4<-c(654, 8, 99, 0)
dist<-rbind(row1, row2, row3, row4)
[,1] [,2] [,3] [,4]
row1 0 123 567 89
row2 98 0 345 543
row3 765 90 0 987
row4 654 8 99 0
Thanks for all of the help in the past and now
Cheers
Grant
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/<http://www.stats.ox.ac.uk/%7Eripley/> University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595