Skip to content

CI for the median difference

8 messages · Vittorio Colagrande, Patrick Breheny, Mark Leeds +3 more

#
It seems to me that your two approaches are calculating CIs for 
different quantities.  The bootstrap methods are calculating a CI for 
the difference in medians, while the Wilcoxon approach is calculating a 
CI for the median of the differences.  If this were the mean, those 
would be the same, but not for the median:

A =c(619, 600, 490, 1076, 654, 955, 563, 955, 827, 873, 1253)
B =c(346, 507, 598, 228, 576, 338, 1153, 354, 560, 517, 381)

 > median(A)-median(B)
[1] 320
 > median(A-B)
[1] 273
#
I am trying to use matrix algebra to get the beta coefficients from a simple bivariate linear regression, y=f(x).
The coefficients should be computable using the following matrix algebra: t(X)Y / t(x)X

I have pasted the code I wrote below. I clearly odes not work both because it returns a matrix rather than a vector containing two elements the beta for the intercept and the beta for x, and because the values produced by the matrix algebra are not the same as those returned by the linear regression. Can someone tell we where I have gone wrong, either in my use of matrix algebra in R, or perhaps at a more fundamental theoretical level?
Thanks,
John

# Define intercept, x and y.
int <- rep(1,100)
x   <- 1:100
y   <- 2*x + rnorm(100)

# Create a matrix to hold values.
data           <- matrix(nrow=100,ncol=3)
dimnames(data) <- list(NULL,c("int","x","y"))
data[,"int"] <- int
data[,"x"]   <- x
data[,"y"]   <- y
data

# Compute numerator.
num <-  cov(data)
num

# Compute denominator
denom <- solve(t(data) %*% data)
denom

# Compute betas, [t(X)Y]/[t(X)Y]
betaRon <-    num %*% denom
betaRon

# Get betas from regression so we can check
# values obtaned by matrix algebra.
fit0 <- lm(y~x)


John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}
#
Mark
Thank you!
John

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
Hi John: I don't understand what you're doing ( not saying that it's wrong.
I just don't
follow it ). Below is code for computing the coefficients using the matrix
way I follow.
Others may understand what you're doing and be able to fix it so I wouldn't
just
use below immediately.

xprimex <- solve(t(data[,1:2]) %*% data[,1:2])
xprimey <- t(data[,1:2]) %*% data[,3]

betas <- xprimex %*% xprimey
print(betas)
On Sat, Feb 18, 2012 at 8:36 AM, John Sorkin <JSorkin at grecc.umaryland.edu>wrote:

            
#
On 18-02-2012, at 14:36, John Sorkin wrote:

            
data should only contain the independent (righthand side) variables.

So

# Create a matrix to hold values.
data           <- matrix(nrow=100,ncol=2)
dimnames(data) <- list(NULL,c("int","x"))#,"y"))
data[,"int"] <- int
data[,"x"]   <- x

You should get correct results.
A better way to calculate the beta's is to calculate them directly without explicitly calculating an invers

# Better
# Compute betas, from t(X)X beta = t(Z) y
# without computing inverse explicitly
betaAlt <- solve(t(data) %*% data, num)
betaAlt

Even better is the method lm uses without forming t(data) %*%  data explicitly but using a (pivoted) QR decomposition of data.

Berend
#
Thank you,
John

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

        
On 18-02-2012, at 14:36, John Sorkin wrote:

            
data should only contain the independent (righthand side) variables.

So

# Create a matrix to hold values.
data           <- matrix(nrow=100,ncol=2)
dimnames(data) <- list(NULL,c("int","x"))#,"y"))
data[,"int"] <- int
data[,"x"]   <- x

You should get correct results.
A better way to calculate the beta's is to calculate them directly without explicitly calculating an invers

# Better
# Compute betas, from t(X)X beta = t(Z) y
# without computing inverse explicitly
betaAlt <- solve(t(data) %*% data, num)
betaAlt

Even better is the method lm uses without forming t(data) %*%  data explicitly but using a (pivoted) QR decomposition of data.

Berend





Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}
#
The Wilcoxon test is not related to the difference in medians but rather to
the Hodges-Lehmann estimator, which is the median of all possible
differences of observations between sample 1 and sample 2.  So what's needed
is a confidence interval for this estimate, obtainable from inverting the
Wilcoxon test.
Frank
 
John Sorkin wrote
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/CI-for-the-median-difference-tp4399508p4399897.html
Sent from the R help mailing list archive at Nabble.com.