Hi,
I am trying to compute bootstrap confidence intervals for weighted means of
paired differences with the boot package. Unfortunately, the weighted mean
estimate lies out of the confidence bounds and hence I am obviously doing
something wrong.
Appreciate any help. Thanks. Here is a reproducible example:
library(boot)
set.seed(1111)
x <- rnorm(50)
y <- rnorm(50)
weights <- runif(50)
weights <- weights / sum(weights)
dataset <- cbind(x,y,weights)
vw_m_diff <- function(dataset,w, d) {
differences <- dataset[d,1]-dataset[d,2]
weights <- w[d]
return(weighted.mean(x=differences, w=weights))
}
res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3])
boot.ci(res_boot)
*BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS*
*Based on 1000 bootstrap replicates*
*CALL : *
*boot.ci <http://boot.ci>(boot.out = res_boot)*
*Intervals : *
*Level Normal Basic *
*95% (-0.8365, -0.3463 ) (-0.8311, -0.3441 ) *
*Level Percentile BCa *
*95% (-0.3276, 0.1594 ) (-0.4781, -0.3477 ) *
weighted.mean(x=dataset[,1]-dataset[,2], w=dataset[,3])
*[1] -0.07321734*
Bootstrap CIs for weighted means of paired differences
11 messages · Ivan, David Winsemius
On Nov 14, 2014, at 12:15 PM, ivan wrote:
Hi,
I am trying to compute bootstrap confidence intervals for weighted means of
paired differences with the boot package. Unfortunately, the weighted mean
estimate lies out of the confidence bounds and hence I am obviously doing
something wrong.
Appreciate any help. Thanks. Here is a reproducible example:
library(boot)
set.seed(1111)
x <- rnorm(50)
y <- rnorm(50)
weights <- runif(50)
weights <- weights / sum(weights)
dataset <- cbind(x,y,weights)
vw_m_diff <- function(dataset,w, d) {
My understanding of the principle underlying the design of the bootstrapped function was that the data was the first argument and the index vector was the second. (I admit to not knowing what it would do with a third argument. So I would have guessed that you wanted:
vw_m_diff <- function(dataset,w) {
differences <- dataset[d,1]-dataset[d,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
I get what appears to me as a sensible set of estimates (since they seem centered on zero) although I further admit I do not know what the theoretic CI _should_ be for this problem:
res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3]) boot.ci(res_boot)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 1000 bootstrap replicates CALL : boot.ci(boot.out = res_boot) Intervals : Level Normal Basic 95% (-0.5657, 0.4962 ) (-0.5713, 0.5062 ) Level Percentile BCa 95% (-0.6527, 0.4249 ) (-0.5579, 0.5023 ) Calculations and Intervals on Original Scale
differences <- dataset[d,1]-dataset[d,2] weights <- w[d] return(weighted.mean(x=differences, w=weights)) } res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3]) boot.ci(res_boot) *BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS* *Based on 1000 bootstrap replicates* *CALL : * *boot.ci <http://boot.ci>(boot.out = res_boot)* *Intervals : * *Level Normal Basic * *95% (-0.8365, -0.3463 ) (-0.8311, -0.3441 ) * *Level Percentile BCa * *95% (-0.3276, 0.1594 ) (-0.4781, -0.3477 ) * weighted.mean(x=dataset[,1]-dataset[,2], w=dataset[,3]) *[1] -0.07321734* [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA
On Nov 14, 2014, at 3:18 PM, David Winsemius wrote:
On Nov 14, 2014, at 12:15 PM, ivan wrote:
Hi,
I am trying to compute bootstrap confidence intervals for weighted means of
paired differences with the boot package. Unfortunately, the weighted mean
estimate lies out of the confidence bounds and hence I am obviously doing
something wrong.
Appreciate any help. Thanks. Here is a reproducible example:
library(boot)
set.seed(1111)
x <- rnorm(50)
y <- rnorm(50)
weights <- runif(50)
weights <- weights / sum(weights)
dataset <- cbind(x,y,weights)
vw_m_diff <- function(dataset,w, d) {
My understanding of the principle underlying the design of the bootstrapped function was that the data was the first argument and the index vector was the second. (I admit to not knowing what it would do with a third argument. So I would have guessed that you wanted:
vw_m_diff <- function(dataset,w) {
differences <- dataset[d,1]-dataset[d,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
I'm sorry. That was the code I first editted. This is the code that produced that output:
vw_m_diff <- function(dataset,w) {
differences <- dataset[w,1]-dataset[w,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
I get what appears to me as a sensible set of estimates (since they seem centered on zero) although I further admit I do not know what the theoretic CI _should_ be for this problem:
res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3]) boot.ci(res_boot)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 1000 bootstrap replicates CALL : boot.ci(boot.out = res_boot) Intervals : Level Normal Basic 95% (-0.5657, 0.4962 ) (-0.5713, 0.5062 ) Level Percentile BCa 95% (-0.6527, 0.4249 ) (-0.5579, 0.5023 ) Calculations and Intervals on Original Scale
differences <- dataset[d,1]-dataset[d,2] weights <- w[d] return(weighted.mean(x=differences, w=weights)) } res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3]) boot.ci(res_boot) *BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS* *Based on 1000 bootstrap replicates* *CALL : * *boot.ci <http://boot.ci>(boot.out = res_boot)* *Intervals : * *Level Normal Basic * *95% (-0.8365, -0.3463 ) (-0.8311, -0.3441 ) * *Level Percentile BCa * *95% (-0.3276, 0.1594 ) (-0.4781, -0.3477 ) * weighted.mean(x=dataset[,1]-dataset[,2], w=dataset[,3]) *[1] -0.07321734* [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA
3 days later
Hi David,
thanks a lot for the response. I see that this works. I am not sure, however, what the appropriate way to do this is. It also works if you do not define weights in the boot() function (weighted bootstrap) but rather in the vw_m_diff function (ordinary bootstrap), i.e.,
vw_m_diff <- function(dataset,d) {
differences <- dataset[d,1]-dataset[d,2]
weights <- dataset[d, "weights"]
return(weighted.mean(x=differences, w=weights))
}
with
boot(dataset, statistic=vw_m_diff, R = 1000)
I guess this is rather a statistical question and hence I will have to look further into it.
In any case, thanks a lot for your help.
Best
On 15 Nov 2014, at 17:27, David Winsemius <dwinsemius at comcast.net> wrote:
On Nov 14, 2014, at 3:18 PM, David Winsemius wrote:
On Nov 14, 2014, at 12:15 PM, ivan wrote:
Hi,
I am trying to compute bootstrap confidence intervals for weighted means of
paired differences with the boot package. Unfortunately, the weighted mean
estimate lies out of the confidence bounds and hence I am obviously doing
something wrong.
Appreciate any help. Thanks. Here is a reproducible example:
library(boot)
set.seed(1111)
x <- rnorm(50)
y <- rnorm(50)
weights <- runif(50)
weights <- weights / sum(weights)
dataset <- cbind(x,y,weights)
vw_m_diff <- function(dataset,w, d) {
My understanding of the principle underlying the design of the bootstrapped function was that the data was the first argument and the index vector was the second. (I admit to not knowing what it would do with a third argument. So I would have guessed that you wanted:
vw_m_diff <- function(dataset,w) {
differences <- dataset[d,1]-dataset[d,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
I'm sorry. That was the code I first editted. This is the code that produced that output:
vw_m_diff <- function(dataset,w) {
differences <- dataset[w,1]-dataset[w,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
I get what appears to me as a sensible set of estimates (since they seem centered on zero) although I further admit I do not know what the theoretic CI _should_ be for this problem:
res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3]) boot.ci(res_boot)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 1000 bootstrap replicates CALL : boot.ci(boot.out = res_boot) Intervals : Level Normal Basic 95% (-0.5657, 0.4962 ) (-0.5713, 0.5062 ) Level Percentile BCa 95% (-0.6527, 0.4249 ) (-0.5579, 0.5023 ) Calculations and Intervals on Original Scale
differences <- dataset[d,1]-dataset[d,2] weights <- w[d] return(weighted.mean(x=differences, w=weights)) } res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3]) boot.ci(res_boot) *BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS* *Based on 1000 bootstrap replicates* *CALL : * *boot.ci <http://boot.ci>(boot.out = res_boot)* *Intervals : * *Level Normal Basic * *95% (-0.8365, -0.3463 ) (-0.8311, -0.3441 ) * *Level Percentile BCa * *95% (-0.3276, 0.1594 ) (-0.4781, -0.3477 ) * weighted.mean(x=dataset[,1]-dataset[,2], w=dataset[,3]) *[1] -0.07321734* [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA
On Nov 19, 2014, at 6:08 AM, i.petzev wrote:
Hi David,
thanks a lot for the response. I see that this works. I am not sure, however, what the appropriate way to do this is. It also works if you do not define weights in the boot() function (weighted bootstrap) but rather in the vw_m_diff function (ordinary bootstrap), i.e.,
vw_m_diff <- function(dataset,d) {
differences <- dataset[d,1]-dataset[d,2]
weights <- dataset[d, "weights"]
return(weighted.mean(x=differences, w=weights))
}
with
boot(dataset, statistic=vw_m_diff, R = 1000)
I don't undertand how that is in the slightest way different than what I suggested. You've only substituted 'd' for 'w' as a formal parameter in the function. That should not change the processed results at all.
David.
>
> I guess this is rather a statistical question and hence I will have to look further into it.
>
> In any case, thanks a lot for your help.
>
> Best
>
>
>
> On 15 Nov 2014, at 17:27, David Winsemius <dwinsemius at comcast.net> wrote:
>
>>
>> On Nov 14, 2014, at 3:18 PM, David Winsemius wrote:
>>
>>>
>>> On Nov 14, 2014, at 12:15 PM, ivan wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to compute bootstrap confidence intervals for weighted means of
>>>> paired differences with the boot package. Unfortunately, the weighted mean
>>>> estimate lies out of the confidence bounds and hence I am obviously doing
>>>> something wrong.
>>>>
>>>> Appreciate any help. Thanks. Here is a reproducible example:
>>>>
>>>>
>>>> library(boot)
>>>> set.seed(1111)
>>>> x <- rnorm(50)
>>>> y <- rnorm(50)
>>>> weights <- runif(50)
>>>> weights <- weights / sum(weights)
>>>> dataset <- cbind(x,y,weights)
>>>> vw_m_diff <- function(dataset,w, d) {
>>>
>>> My understanding of the principle underlying the design of the bootstrapped function was that the data was the first argument and the index vector was the second. (I admit to not knowing what it would do with a third argument. So I would have guessed that you wanted:
>>>
>>> vw_m_diff <- function(dataset,w) {
>>> differences <- dataset[d,1]-dataset[d,2]
>>> weights <- dataset[w, "weights"]
>>> return(weighted.mean(x=differences, w=weights))
>>> }
>>
>> I'm sorry. That was the code I first editted. This is the code that produced that output:
>>
>> vw_m_diff <- function(dataset,w) {
>> differences <- dataset[w,1]-dataset[w,2]
>> weights <- dataset[w, "weights"]
>> return(weighted.mean(x=differences, w=weights))
>> }
>>
>>>
>>> I get what appears to me as a sensible set of estimates (since they seem centered on zero) although I further admit I do not know what the theoretic CI _should_ be for this problem:
>>>
>>>> res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3])
>>>> boot.ci(res_boot)
>>> BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
>>> Based on 1000 bootstrap replicates
>>>
>>> CALL :
>>> boot.ci(boot.out = res_boot)
>>>
>>> Intervals :
>>> Level Normal Basic
>>> 95% (-0.5657, 0.4962 ) (-0.5713, 0.5062 )
>>>
>>> Level Percentile BCa
>>> 95% (-0.6527, 0.4249 ) (-0.5579, 0.5023 )
>>> Calculations and Intervals on Original Scale
>>>
>>>
>>>> differences <- dataset[d,1]-dataset[d,2]
>>>> weights <- w[d]
>>>> return(weighted.mean(x=differences, w=weights))
>>>> }
>>>> res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3])
>>>> boot.ci(res_boot)
>>>>
>>>> *BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS*
>>>> *Based on 1000 bootstrap replicates*
>>>>
>>>> *CALL : *
>>>> *boot.ci <http://boot.ci>(boot.out = res_boot)*
>>>>
>>>> *Intervals : *
>>>> *Level Normal Basic *
>>>> *95% (-0.8365, -0.3463 ) (-0.8311, -0.3441 ) *
>>>>
>>>> *Level Percentile BCa *
>>>> *95% (-0.3276, 0.1594 ) (-0.4781, -0.3477 ) *
>>>>
>>>> weighted.mean(x=dataset[,1]-dataset[,2], w=dataset[,3])
>>>>
>>>> *[1] -0.07321734*
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius
>> Alameda, CA, USA
>
David Winsemius
Alameda, CA, USA
Hi David,
sorry, I was not clear. The difference comes from defining or not defining ?w? in the boot() function. The results with your function and your approach are thus:
set.seed(1111)
x <- rnorm(50)
y <- rnorm(50)
weights <- runif(50)
weights <- weights / sum(weights)
dataset <- cbind(x,y,weights)
vw_m_diff <- function(dataset,w) {
differences <- dataset[w,1]-dataset[w,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3])
boot.ci(res_boot)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = res_boot)
Intervals :
Level Normal Basic
95% (-0.5657, 0.4962 ) (-0.5713, 0.5062 )
Level Percentile BCa
95% (-0.6527, 0.4249 ) (-0.5579, 0.5023 )
Calculations and Intervals on Original Scale
********************************************************************************************************************
However, without defining ?w? in the bootstrap function, i.e., running an ordinary and not a weighted bootstrap, the results are:
res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000)
boot.ci(res_boot)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = res_boot)
Intervals :
Level Normal Basic
95% (-0.6265, 0.4966 ) (-0.6125, 0.5249 )
Level Percentile BCa
95% (-0.6714, 0.4661 ) (-0.6747, 0.4559 )
Calculations and Intervals on Original Scale
On 19 Nov 2014, at 17:49, David Winsemius <dwinsemius at comcast.net> wrote:
vw_m_diff <- function(dataset,w) {
differences <- dataset[w,1]-dataset[w,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
On Nov 20, 2014, at 2:23 AM, i.petzev wrote:
Hi David, sorry, I was not clear.
Right. You never were clear about what you wanted and your examples was so statistically symmetric that it is still hard to see what is needed. The examples below show CI's that are arguably equivalent. I can be faulted for attempting to provide code that produced a sensible answer to a vague question to which I was only guessing at the intent.
The difference comes from defining or not defining ?w? in the boot() function. The results with your function and your approach are thus:
set.seed(1111)
x <- rnorm(50)
y <- rnorm(50)
weights <- runif(50)
weights <- weights / sum(weights)
dataset <- cbind(x,y,weights)
vw_m_diff <- function(dataset,w) {
differences <- dataset[w,1]-dataset[w,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3])
boot.ci(res_boot)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = res_boot)
Intervals :
Level Normal Basic
95% (-0.5657, 0.4962 ) (-0.5713, 0.5062 )
Level Percentile BCa
95% (-0.6527, 0.4249 ) (-0.5579, 0.5023 )
Calculations and Intervals on Original Scale
********************************************************************************************************************
However, without defining ?w? in the bootstrap function, i.e., running an ordinary and not a weighted bootstrap, the results are:
res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000)
boot.ci(res_boot)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = res_boot)
Intervals :
Level Normal Basic
95% (-0.6265, 0.4966 ) (-0.6125, 0.5249 )
I hope you are not saying that because those CI's are different that there is some meaning in that difference. Bootstrap runs will always be "different" than each other unless you use set.seed(.) before the runs.
Level Percentile BCa 95% (-0.6714, 0.4661 ) (-0.6747, 0.4559 ) Calculations and Intervals on Original Scale On 19 Nov 2014, at 17:49, David Winsemius <dwinsemius at comcast.net> wrote:
vw_m_diff <- function(dataset,w) {
differences <- dataset[w,1]-dataset[w,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
David Winsemius Alameda, CA, USA
I am aware of the fact that bootstrapping produces different CIs with every run. I still believe that there is a difference between both types of procedures. My understanding is that setting "w" in the boot() function influences the "importance" of observations or how the bootstrap selects the observations. I.e, observation i does not have the same probability of being chosen as observation j when "w" is defined in the boot() function. If you return res_boot you will notice that with "w" being set in the boot() function, the function call states "weighted bootstrap". If not, it states "ordinary nonparametric bootstrap". But maybe I am wrong. On Thu, Nov 20, 2014 at 8:19 PM, David Winsemius <dwinsemius at comcast.net> wrote:
On Nov 20, 2014, at 2:23 AM, i.petzev wrote:
Hi David, sorry, I was not clear.
Right. You never were clear about what you wanted and your examples was so statistically symmetric that it is still hard to see what is needed. The examples below show CI's that are arguably equivalent. I can be faulted for attempting to provide code that produced a sensible answer to a vague question to which I was only guessing at the intent.
The difference comes from defining or not defining ?w? in the boot()
function. The results with your function and your approach are thus:
set.seed(1111)
x <- rnorm(50)
y <- rnorm(50)
weights <- runif(50)
weights <- weights / sum(weights)
dataset <- cbind(x,y,weights)
vw_m_diff <- function(dataset,w) {
differences <- dataset[w,1]-dataset[w,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3])
boot.ci(res_boot)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = res_boot)
Intervals :
Level Normal Basic
95% (-0.5657, 0.4962 ) (-0.5713, 0.5062 )
Level Percentile BCa
95% (-0.6527, 0.4249 ) (-0.5579, 0.5023 )
Calculations and Intervals on Original Scale
********************************************************************************************************************
However, without defining ?w? in the bootstrap function, i.e., running
an ordinary and not a weighted bootstrap, the results are:
res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000) boot.ci(res_boot) BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 1000 bootstrap replicates CALL : boot.ci(boot.out = res_boot) Intervals : Level Normal Basic 95% (-0.6265, 0.4966 ) (-0.6125, 0.5249 )
I hope you are not saying that because those CI's are different that there is some meaning in that difference. Bootstrap runs will always be "different" than each other unless you use set.seed(.) before the runs.
Level Percentile BCa 95% (-0.6714, 0.4661 ) (-0.6747, 0.4559 ) Calculations and Intervals on Original Scale On 19 Nov 2014, at 17:49, David Winsemius <dwinsemius at comcast.net>
wrote:
vw_m_diff <- function(dataset,w) {
differences <- dataset[w,1]-dataset[w,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
David Winsemius Alameda, CA, USA
On Nov 21, 2014, at 6:52 AM, ivan wrote:
I am aware of the fact that bootstrapping produces different CIs with every run. I still believe that there is a difference between both types of procedures. My understanding is that setting "w" in the boot() function influences the "importance" of observations or how the bootstrap selects the observations. I.e, observation i does not have the same probability of being chosen as observation j when "w" is defined in the boot() function. If you return res_boot you will notice that with "w" being set in the boot() function, the function call states "weighted bootstrap". If not, it states "ordinary nonparametric bootstrap". But maybe I am wrong.
OK. So in the the second call w affects the probability of a case being sent to the boot-function as well as being used in the boot-function; while with the "non-weighted call" the w's are only affecting the individual mean estimates. So the second one is different. And as I suggested earlier you never described the goals of the investigation or the meaning of the variables. I can tell you that when Davison and Hinkley offered examples of using a bootstrap for a weighted bootstrap mean, they compared a stratified analysis with an example where the weighting was only used on the inner function (example 3.2, practical 3.14 pp 72, 131 of their book) with one where the strata parameter was used. But so far I don't think you have ever described what sort of weights these actually are. In that example the weights were the inverse variances of the sample groups. They didn't use a 'weights' parameter in the boot call. I'm do not know if it was part of the S package that was being used at the time. I tried to find an example of a weighted bootstrap in V&R 4e but did not see one. Prof Ripley is the maintainer of the boot package. In the V&R book, Angelo Canty is given the credit for writing the boot package for S. I think you should consult the code, first. And you should also look at the `stype` parameter where "w" is one option.
David.
>
> On Thu, Nov 20, 2014 at 8:19 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Nov 20, 2014, at 2:23 AM, i.petzev wrote:
>
> > Hi David,
> >
> > sorry, I was not clear.
>
> Right. You never were clear about what you wanted and your examples was so statistically symmetric that it is still hard to see what is needed. The examples below show CI's that are arguably equivalent. I can be faulted for attempting to provide code that produced a sensible answer to a vague question to which I was only guessing at the intent.
>
>
> > The difference comes from defining or not defining ?w? in the boot() function. The results with your function and your approach are thus:
> >
> > set.seed(1111)
> > x <- rnorm(50)
> > y <- rnorm(50)
> > weights <- runif(50)
> > weights <- weights / sum(weights)
> > dataset <- cbind(x,y,weights)
> >
> > vw_m_diff <- function(dataset,w) {
> > differences <- dataset[w,1]-dataset[w,2]
> > weights <- dataset[w, "weights"]
> > return(weighted.mean(x=differences, w=weights))
> > }
> > res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3])
> > boot.ci(res_boot)
> >
> > BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
> > Based on 1000 bootstrap replicates
> >
> > CALL :
> > boot.ci(boot.out = res_boot)
> >
> > Intervals :
> > Level Normal Basic
> > 95% (-0.5657, 0.4962 ) (-0.5713, 0.5062 )
> >
> > Level Percentile BCa
> > 95% (-0.6527, 0.4249 ) (-0.5579, 0.5023 )
> > Calculations and Intervals on Original Scale
> >
> > ********************************************************************************************************************
> >
> > However, without defining ?w? in the bootstrap function, i.e., running an ordinary and not a weighted bootstrap, the results are:
> >
> > res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000)
> > boot.ci(res_boot)
> >
> > BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
> > Based on 1000 bootstrap replicates
> >
> > CALL :
> > boot.ci(boot.out = res_boot)
> >
> > Intervals :
> > Level Normal Basic
> > 95% (-0.6265, 0.4966 ) (-0.6125, 0.5249 )
>
> I hope you are not saying that because those CI's are different that there is some meaning in that difference. Bootstrap runs will always be "different" than each other unless you use set.seed(.) before the runs.
>
> >
> > Level Percentile BCa
> > 95% (-0.6714, 0.4661 ) (-0.6747, 0.4559 )
> > Calculations and Intervals on Original Scale
> >
> > On 19 Nov 2014, at 17:49, David Winsemius <dwinsemius at comcast.net> wrote:
> >
> >>>> vw_m_diff <- function(dataset,w) {
> >>>> differences <- dataset[w,1]-dataset[w,2]
> >>>> weights <- dataset[w, "weights"]
> >>>> return(weighted.mean(x=differences, w=weights))
> >>>> }
> >
>
> David Winsemius
> Alameda, CA, USA
>
>
David Winsemius
Alameda, CA, USA
On 21 Nov 2014, at 19:18, David Winsemius <dwinsemius at comcast.net> wrote:
On Nov 21, 2014, at 6:52 AM, ivan wrote:
I am aware of the fact that bootstrapping produces different CIs with every run. I still believe that there is a difference between both types of procedures. My understanding is that setting "w" in the boot() function influences the "importance" of observations or how the bootstrap selects the observations. I.e, observation i does not have the same probability of being chosen as observation j when "w" is defined in the boot() function. If you return res_boot you will notice that with "w" being set in the boot() function, the function call states "weighted bootstrap". If not, it states "ordinary nonparametric bootstrap". But maybe I am wrong.
OK. So in the the second call w affects the probability of a case being sent to the boot-function as well as being used in the boot-function; while with the "non-weighted call" the w's are only affecting the individual mean estimates. So the second one is different. And as I suggested earlier you never described the goals of the investigation or the meaning of the variables. I can tell you that when Davison and Hinkley offered examples of using a bootstrap for a weighted bootstrap mean, they compared a stratified analysis with an example where the weighting was only used on the inner function (example 3.2, practical 3.14 pp 72, 131 of their book) with one where the strata parameter was used. But so far I don't think you have ever described what sort of weights these actually are. In that example the weights were the inverse variances of the sample groups. They didn't use a 'weights' parameter in the boot call. I'm do not know if it was part of the S package that was being used at the time. I tried to find an example of a weighted bootstrap in V&R 4e but did not see one. Prof Ripley is the maintainer of the boot package. In the V&R book, Angelo Canty is given the credit for writing the boot package for S. I think you should consult the code, first. And you should also look at the `stype` parameter where "w" is one option. -- David.
On Thu, Nov 20, 2014 at 8:19 PM, David Winsemius <dwinsemius at comcast.net> wrote: On Nov 20, 2014, at 2:23 AM, i.petzev wrote:
Hi David, sorry, I was not clear.
Right. You never were clear about what you wanted and your examples was so statistically symmetric that it is still hard to see what is needed. The examples below show CI's that are arguably equivalent. I can be faulted for attempting to provide code that produced a sensible answer to a vague question to which I was only guessing at the intent.
The difference comes from defining or not defining ?w? in the boot() function. The results with your function and your approach are thus:
set.seed(1111)
x <- rnorm(50)
y <- rnorm(50)
weights <- runif(50)
weights <- weights / sum(weights)
dataset <- cbind(x,y,weights)
vw_m_diff <- function(dataset,w) {
differences <- dataset[w,1]-dataset[w,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3])
boot.ci(res_boot)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = res_boot)
Intervals :
Level Normal Basic
95% (-0.5657, 0.4962 ) (-0.5713, 0.5062 )
Level Percentile BCa
95% (-0.6527, 0.4249 ) (-0.5579, 0.5023 )
Calculations and Intervals on Original Scale
********************************************************************************************************************
However, without defining ?w? in the bootstrap function, i.e., running an ordinary and not a weighted bootstrap, the results are:
res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000)
boot.ci(res_boot)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = res_boot)
Intervals :
Level Normal Basic
95% (-0.6265, 0.4966 ) (-0.6125, 0.5249 )
I hope you are not saying that because those CI's are different that there is some meaning in that difference. Bootstrap runs will always be "different" than each other unless you use set.seed(.) before the runs.
Level Percentile BCa 95% (-0.6714, 0.4661 ) (-0.6747, 0.4559 ) Calculations and Intervals on Original Scale On 19 Nov 2014, at 17:49, David Winsemius <dwinsemius at comcast.net> wrote:
vw_m_diff <- function(dataset,w) {
differences <- dataset[w,1]-dataset[w,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
David Winsemius Alameda, CA, USA
David Winsemius Alameda, CA, USA
Ok, thanks for the suggestions. I will look into that. And you are absolutely right that I should have been more clear about what type of weighting I want. So to clarify: I run time series regressions of returns of company i on two different sets of explanatory variables. Then I extract the respective intercepts of the two regressions and take the difference between both. I repeat this for the whole sample of companies and then compute the market value weighted average of those differences.
On 21 Nov 2014, at 19:18, David Winsemius <dwinsemius at comcast.net> wrote:
On Nov 21, 2014, at 6:52 AM, ivan wrote:
I am aware of the fact that bootstrapping produces different CIs with every run. I still believe that there is a difference between both types of procedures. My understanding is that setting "w" in the boot() function influences the "importance" of observations or how the bootstrap selects the observations. I.e, observation i does not have the same probability of being chosen as observation j when "w" is defined in the boot() function. If you return res_boot you will notice that with "w" being set in the boot() function, the function call states "weighted bootstrap". If not, it states "ordinary nonparametric bootstrap". But maybe I am wrong.
OK. So in the the second call w affects the probability of a case being sent to the boot-function as well as being used in the boot-function; while with the "non-weighted call" the w's are only affecting the individual mean estimates. So the second one is different. And as I suggested earlier you never described the goals of the investigation or the meaning of the variables. I can tell you that when Davison and Hinkley offered examples of using a bootstrap for a weighted bootstrap mean, they compared a stratified analysis with an example where the weighting was only used on the inner function (example 3.2, practical 3.14 pp 72, 131 of their book) with one where the strata parameter was used. But so far I don't think you have ever described what sort of weights these actually are. In that example the weights were the inverse variances of the sample groups. They didn't use a 'weights' parameter in the boot call. I'm do not know if it was part of the S package that was being used at the time. I tried to find an example of a weighted bootstrap in V&R 4e but did not see one. Prof Ripley is the maintainer of the boot package. In the V&R book, Angelo Canty is given the credit for writing the boot package for S. I think you should consult the code, first. And you should also look at the `stype` parameter where "w" is one option. -- David.
On Thu, Nov 20, 2014 at 8:19 PM, David Winsemius <dwinsemius at comcast.net> wrote: On Nov 20, 2014, at 2:23 AM, i.petzev wrote:
Hi David, sorry, I was not clear.
Right. You never were clear about what you wanted and your examples was so statistically symmetric that it is still hard to see what is needed. The examples below show CI's that are arguably equivalent. I can be faulted for attempting to provide code that produced a sensible answer to a vague question to which I was only guessing at the intent.
The difference comes from defining or not defining ?w? in the boot() function. The results with your function and your approach are thus:
set.seed(1111)
x <- rnorm(50)
y <- rnorm(50)
weights <- runif(50)
weights <- weights / sum(weights)
dataset <- cbind(x,y,weights)
vw_m_diff <- function(dataset,w) {
differences <- dataset[w,1]-dataset[w,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3])
boot.ci(res_boot)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = res_boot)
Intervals :
Level Normal Basic
95% (-0.5657, 0.4962 ) (-0.5713, 0.5062 )
Level Percentile BCa
95% (-0.6527, 0.4249 ) (-0.5579, 0.5023 )
Calculations and Intervals on Original Scale
********************************************************************************************************************
However, without defining ?w? in the bootstrap function, i.e., running an ordinary and not a weighted bootstrap, the results are:
res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000)
boot.ci(res_boot)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = res_boot)
Intervals :
Level Normal Basic
95% (-0.6265, 0.4966 ) (-0.6125, 0.5249 )
I hope you are not saying that because those CI's are different that there is some meaning in that difference. Bootstrap runs will always be "different" than each other unless you use set.seed(.) before the runs.
Level Percentile BCa 95% (-0.6714, 0.4661 ) (-0.6747, 0.4559 ) Calculations and Intervals on Original Scale On 19 Nov 2014, at 17:49, David Winsemius <dwinsemius at comcast.net> wrote:
vw_m_diff <- function(dataset,w) {
differences <- dataset[w,1]-dataset[w,2]
weights <- dataset[w, "weights"]
return(weighted.mean(x=differences, w=weights))
}
David Winsemius Alameda, CA, USA
David Winsemius Alameda, CA, USA