CI for the median difference

? stato filtrato un testo allegato il cui set di caratteri non era
indicato...
Nome: non disponibile
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120218/c759ac35/attachment.pl>
It seems to me that your two approaches are calculating CIs for 
different quantities.  The bootstrap methods are calculating a CI for 
the difference in medians, while the Wilcoxon approach is calculating a 
CI for the median of the differences.  If this were the mean, those 
would be the same, but not for the median:

A =c(619, 600, 490, 1076, 654, 955, 563, 955, 827, 873, 1253)
B =c(346, 507, 598, 228, 576, 338, 1153, 354, 560, 517, 381)

 > median(A)-median(B)
[1] 320
 > median(A-B)
[1] 273
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky

On 02/18/2012 05:05 AM, Vittorio Colagrande wrote:
> Dear R-group,
>
>
>
> I have run into a problem in estimating confidence intervals for the median difference.
>
> I want to establish a confidence interval at (1- alpha) level for the difference between the
>
> medians of two indipendent samples (size n and m), by using the Wilcoxon distribution
>
> or with bootstrap methods.
>
> First method, we consider the z matrix of d=n???m differences of the first and second sample
>
> data and we order these differences in a y vector. By the Wilcoxon distribuition (W) we
>
> determine the q quantile such that Prob(W<q)= alpha/2, inf and sup of confidence interval are
>
> respectively the q-th and (d-q+1)-th elements of the y vector, while we obtain the difference
>
> median by the y distribution. This method is also used to establish the CI by wilcox.test.
>
> Example. Two indipendent sample A and B (n=11, m=13)  of CD4 count cells (T-helper cells):
>
> A =c(619, 600, 490, 1076, 654, 955, 563, 955, 827, 873, 1253)
>
> B =c(346, 507, 598, 228, 576, 338, 1153, 354, 560, 517, 381, 415, 626)
>
>
>
> 1)      CI 95% by matrix z and y vector:
>
> n=length(A)
>
> m=length(B)
>
> d=n*m
>
> z=matrix(0,m,n)
>
> for(j in seq_len(n))
>
> z[, j]=A[j] - B
>
> y=sort(as.vector(z))
>
> q=qwilcox(0.05/2,n,m,lower.tail = TRUE, log.p = FALSE)
>
> inf=y[q]
>
> sup=y[d-q+1]
>
> med=median(y)
>
> results:  inf = 100, sup = 516 and med = 300.
>
>
>
> 2)      CI 95% by wilcox.test:
>
>        I=wilcox.test(A,B,conf.lev=0.95,conf.int=TRUE,exact=F,correct=T)
>
>       inf=I$conf.int[1]
>
>       sup=I$conf.int[2].
>
> results:  inf = 99.9, sup = 516.
>
>
>
> Second method, bootstrap each sample separately, creating the sampling distribution for
>
> each median. Then calculate the difference between the two medians, and create the
>
> sampling distribution of those differences. This is the sampling distribution we care about.
>
> Once we have that distribution we can establish a confidence interval. Some CI 95%,
>
> with reference to the CD4 example, one given below.
>
>
>
> 1)     First procedure (package boot):
>
> library(boot)
>
>        n=length(A)
>
>        m=length(B)
>
>        y=c(A,B)
>
>       camp=data.frame(group=rep(c(1,2),c(n,m)),y)
>
>       dif.median=function(data,i) {
>
>       d=data[i,]
>
>       n1=n+1
>
>       m1=n+m
>
>       median(d$y[1:n])-median(d$y[n1:m1])  }
>
>       dif.boot=boot(camp,dif.median,R=10000, strata=camp$group)
>
>       boot.ci(dif.boot, conf =0.95, type="bca")
>
> results:  inf = 59, sup = 574.
>
>
>
> 2)      Second procedure (package pairwiseCI):
>
> library(pairwiseCI)
>
> MedDiff=Median.diff(A, B, conf.level=0.95, alternative="two.sided",R=10000)
>
> MedDiff$conf.int
>
> MedDiff$estimate
>
> results:  inf = 56, sup = 574, median=320
>
>
>
> 3)      Third procedure (stratified bootstrap):
>
>        dif<- numeric(10000)
>
>        for(i in seq_len(10000))
>
>        dif[i]<- median(sample(A, replace=TRUE)) - median(sample(B, replace=TRUE))
>
>        quantile(dif,prob=c(0.5,(1-0.95)/2,(1-(1-0.95)/2)))
>
> results: inf = 56, sup = 574, median = 313.
>
>
>
> 4)      Fourth procedure (package simpleboot)
>
>   library(simpleboot)
>
>   boot_diff<- two.boot(A, B, median, R = 10000)
>
>   boot.ci(boot_diff,conf=0.95,type="bca")
>
> results: inf = 59, sup = 574.
>
>
>
> The bootstrap procedures do get the same results, but the confidence intervals are
>
> significantly different from those obtained using the method that refers to the Wilcoxon
>
> distribution.
> Problem: does this difference depend on really "different" methods
>
> or on incorrect implementation of the bootstrap technique?
>
>
>
> I will greatly appreciate any clarification you could provide.
>
> Best regards.
>
> Vittorio Colagrande
>
> 	[[alternative HTML version deleted]]
>
I am trying to use matrix algebra to get the beta coefficients from a simple bivariate linear regression, y=f(x).
The coefficients should be computable using the following matrix algebra: t(X)Y / t(x)X

I have pasted the code I wrote below. I clearly odes not work both because it returns a matrix rather than a vector containing two elements the beta for the intercept and the beta for x, and because the values produced by the matrix algebra are not the same as those returned by the linear regression. Can someone tell we where I have gone wrong, either in my use of matrix algebra in R, or perhaps at a more fundamental theoretical level?
Thanks,
John

# Define intercept, x and y.
int <- rep(1,100)
x   <- 1:100
y   <- 2*x + rnorm(100)

# Create a matrix to hold values.
data           <- matrix(nrow=100,ncol=3)
dimnames(data) <- list(NULL,c("int","x","y"))
data[,"int"] <- int
data[,"x"]   <- x
data[,"y"]   <- y
data

# Compute numerator.
num <-  cov(data)
num

# Compute denominator
denom <- solve(t(data) %*% data)
denom

# Compute betas, [t(X)Y]/[t(X)Y]
betaRon <-    num %*% denom
betaRon

# Get betas from regression so we can check
# values obtaned by matrix algebra.
fit0 <- lm(y~x)

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120218/897d04ff/attachment.pl>
Mark
Thank you!
John

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
Mark Leeds <markleeds2 at gmail.com> 2/18/2012 8:55 AM >>>
Hi John: I don't understand what you're doing ( not saying that it's wrong.
I just don't
follow it ). Below is code for computing the coefficients using the matrix
way I follow.
Others may understand what you're doing and be able to fix it so I wouldn't
just
use below immediately.

xprimex <- solve(t(data[,1:2]) %*% data[,1:2])
xprimey <- t(data[,1:2]) %*% data[,3]

betas <- xprimex %*% xprimey
print(betas)

I am trying to use matrix algebra to get the beta coefficients from a
simple bivariate linear regression, y=f(x).
The coefficients should be computable using the following matrix algebra:
t(X)Y / t(x)X

I have pasted the code I wrote below. I clearly odes not work both because
it returns a matrix rather than a vector containing two elements the beta
for the intercept and the beta for x, and because the values produced by
the matrix algebra are not the same as those returned by the linear
regression. Can someone tell we where I have gone wrong, either in my use
of matrix algebra in R, or perhaps at a more fundamental theoretical level?
Thanks,
John

# Define intercept, x and y.
int <- rep(1,100)
x   <- 1:100
y   <- 2*x + rnorm(100)

# Create a matrix to hold values.
data           <- matrix(nrow=100,ncol=3)
dimnames(data) <- list(NULL,c("int","x","y"))
data[,"int"] <- int
data[,"x"]   <- x
data[,"y"]   <- y
data

# Compute numerator.
num <-  cov(data)
num

# Compute denominator
denom <- solve(t(data) %*% data)
denom

# Compute betas, [t(X)Y]/[t(X)Y]
betaRon <-    num %*% denom
betaRon

# Get betas from regression so we can check
# values obtaned by matrix algebra.
fit0 <- lm(y~x)

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for ...{{dropped:17}}

I am trying to use matrix algebra to get the beta coefficients from a simple bivariate linear regression, y=f(x).
The coefficients should be computable using the following matrix algebra: t(X)Y / t(x)X

I have pasted the code I wrote below. I clearly odes not work both because it returns a matrix rather than a vector containing two elements the beta for the intercept and the beta for x, and because the values produced by the matrix algebra are not the same as those returned by the linear regression. Can someone tell we where I have gone wrong, either in my use of matrix algebra in R, or perhaps at a more fundamental theoretical level?
Thanks,
John

# Define intercept, x and y.
int <- rep(1,100)
x   <- 1:100
y   <- 2*x + rnorm(100)

# Create a matrix to hold values.
data           <- matrix(nrow=100,ncol=3)
dimnames(data) <- list(NULL,c("int","x","y"))
data[,"int"] <- int
data[,"x"]   <- x
data[,"y"]   <- y
data

# Compute numerator.
num <-  cov(data)
num

# Compute denominator
denom <- solve(t(data) %*% data)
denom

# Compute betas, [t(X)Y]/[t(X)Y]
betaRon <-    num %*% denom
betaRon

# Get betas from regression so we can check
# values obtaned by matrix algebra.
fit0 <- lm(y~x)

data should only contain the independent (righthand side) variables.

So

# Create a matrix to hold values.
data           <- matrix(nrow=100,ncol=2)
dimnames(data) <- list(NULL,c("int","x"))#,"y"))
data[,"int"] <- int
data[,"x"]   <- x

You should get correct results.
A better way to calculate the beta's is to calculate them directly without explicitly calculating an invers

# Better
# Compute betas, from t(X)X beta = t(Z) y
# without computing inverse explicitly
betaAlt <- solve(t(data) %*% data, num)
betaAlt

Even better is the method lm uses without forming t(data) %*%  data explicitly but using a (pivoted) QR decomposition of data.

Berend
Thank you,
John

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
Berend Hasselman <bhh at xs4all.nl> 2/18/2012 9:05 AM >>>

I am trying to use matrix algebra to get the beta coefficients from a simple bivariate linear regression, y=f(x).
The coefficients should be computable using the following matrix algebra: t(X)Y / t(x)X

I have pasted the code I wrote below. I clearly odes not work both because it returns a matrix rather than a vector containing two elements the beta for the intercept and the beta for x, and because the values produced by the matrix algebra are not the same as those returned by the linear regression. Can someone tell we where I have gone wrong, either in my use of matrix algebra in R, or perhaps at a more fundamental theoretical level?
Thanks,
John

# Define intercept, x and y.
int <- rep(1,100)
x   <- 1:100
y   <- 2*x + rnorm(100)

# Create a matrix to hold values.
data           <- matrix(nrow=100,ncol=3)
dimnames(data) <- list(NULL,c("int","x","y"))
data[,"int"] <- int
data[,"x"]   <- x
data[,"y"]   <- y
data

# Compute numerator.
num <-  cov(data)
num

# Compute denominator
denom <- solve(t(data) %*% data)
denom

# Compute betas, [t(X)Y]/[t(X)Y]
betaRon <-    num %*% denom
betaRon

# Get betas from regression so we can check
# values obtaned by matrix algebra.
fit0 <- lm(y~x)

data should only contain the independent (righthand side) variables.

So

# Create a matrix to hold values.
data           <- matrix(nrow=100,ncol=2)
dimnames(data) <- list(NULL,c("int","x"))#,"y"))
data[,"int"] <- int
data[,"x"]   <- x

You should get correct results.
A better way to calculate the beta's is to calculate them directly without explicitly calculating an invers

# Better
# Compute betas, from t(X)X beta = t(Z) y
# without computing inverse explicitly
betaAlt <- solve(t(data) %*% data, num)
betaAlt

Even better is the method lm uses without forming t(data) %*%  data explicitly but using a (pivoted) QR decomposition of data.

Berend

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}
The Wilcoxon test is not related to the difference in medians but rather to
the Hodges-Lehmann estimator, which is the median of all possible
differences of observations between sample 1 and sample 2.  So what's needed
is a confidence interval for this estimate, obtainable from inverting the
Wilcoxon test.
Frank

John Sorkin wrote
Thank you,
John

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Berend Hasselman &lt;bhh@&gt; 2/18/2012 9:05 AM >>>
On 18-02-2012, at 14:36, John Sorkin wrote:

I am trying to use matrix algebra to get the beta coefficients from a
simple bivariate linear regression, y=f(x).
The coefficients should be computable using the following matrix algebra:
t(X)Y / t(x)X

I have pasted the code I wrote below. I clearly odes not work both
because it returns a matrix rather than a vector containing two elements
the beta for the intercept and the beta for x, and because the values
produced by the matrix algebra are not the same as those returned by the
linear regression. Can someone tell we where I have gone wrong, either in
my use of matrix algebra in R, or perhaps at a more fundamental
theoretical level?
Thanks,
John

# Define intercept, x and y.
int <- rep(1,100)
x   <- 1:100
y   <- 2*x + rnorm(100)

# Create a matrix to hold values.
data           <- matrix(nrow=100,ncol=3)
dimnames(data) <- list(NULL,c("int","x","y"))
data[,"int"] <- int
data[,"x"]   <- x
data[,"y"]   <- y
data

# Compute numerator.
num <-  cov(data)
num

# Compute denominator
denom <- solve(t(data) %*% data)
denom

# Compute betas, [t(X)Y]/[t(X)Y]
betaRon <-    num %*% denom
betaRon

# Get betas from regression so we can check
# values obtaned by matrix algebra.
fit0 <- lm(y~x)

data should only contain the independent (righthand side) variables.

So

# Create a matrix to hold values.
data           <- matrix(nrow=100,ncol=2)
dimnames(data) <- list(NULL,c("int","x"))#,"y"))
data[,"int"] <- int
data[,"x"]   <- x

You should get correct results.
A better way to calculate the beta's is to calculate them directly without
explicitly calculating an invers

# Better
# Compute betas, from t(X)X beta = t(Z) y
# without computing inverse explicitly
betaAlt <- solve(t(data) %*% data, num)
betaAlt

Even better is the method lm uses without forming t(data) %*%  data
explicitly but using a (pivoted) QR decomposition of data.

Berend

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

______________________________________________
R-help@ mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/CI-for-the-median-difference-tp4399508p4399897.html
Sent from the R help mailing list archive at Nabble.com.