Dear list,
I have a problem on calculating the standard error of
Goodman-Kurskal's gamma using delta method. I exactly follow the
method and forumla described in Problem 3.27 of Alan Agresti's
Categorical Data Analysis (2nd edition). The data I used is also from
the job satisfaction vs. income example from that book.
job <- matrix(c(1, 3, 10, 6, 2, 3, 10, 7, 1, 6, 14, 12, 0, 1, 9, 11),
nrow = 4, ncol = 4, byrow = TRUE, dimnames = list(c("< 15,000",
"15,000 - 25,000", "25,000 - 40,000", "> 40,000"), c("VD", "LD", "MS",
"VS")))
The following code is for calculating gamma value, which is consistent
with the result presented in section 2.4.5 of that book.
C <- 0
D <- 0
for (i in 1:nrow(job)){
for (j in 1:ncol(job)){
pi_c <- 0
pi_d <- 0
for (h in 1:nrow(job)){
for (k in 1:ncol(job)){
if ((h > i & k > j) | (h < i & k < j)){
pi_c <- pi_c + job[h, k]/sum(job)
}
if ((h > i & k < j) | (h < i & k > j)){
pi_d <- pi_d + job[h, k]/sum(job)
}
}
}
C <- C + job[i, j] * pi_c
D <- D + job[i, j] * pi_d
}
}
gamma <- (C - D) / (C + D) # gamma = 0.221 for this example.
The following code is for calculating stardard error of gamma.
sigma.squared <- 0
for (i in 1:nrow(job)){
for (j in 1:ncol(job)){
pi_c <- 0
pi_d <- 0
for (h in 1:nrow(job)){
for (k in 1:ncol(job)){
if ((h > i & k > j) | (h < i & k < j)){
pi_c <- pi_c + job[h, k]/sum(job)
}
if ((h > i & k < j) | (h < i & k > j)){
pi_d <- pi_d + job[h, k]/sum(job)
}
}
}
phi <- 4 * (pi_c * D - pi_d * C) / (C + D)^2
sigma.squared <- sigma.squared + phi^2
}
}
se <- (sigma.squared/sum(job))^.5 # 0.00748, which is different from
the SE 0.117 given in section 3.4.3 of that book.
I am not able to figure out what is the problem with my code... Could
anyone point out what the problem is?
Thanks.
Wuming
Calculating Goodman-Kurskal's gamma using delta method
2 messages · Wuming Gong, Frank E Harrell Jr
Wuming Gong wrote:
Dear list,
I have a problem on calculating the standard error of
Goodman-Kurskal's gamma using delta method. I exactly follow the
method and forumla described in Problem 3.27 of Alan Agresti's
Categorical Data Analysis (2nd edition). The data I used is also from
the job satisfaction vs. income example from that book.
job <- matrix(c(1, 3, 10, 6, 2, 3, 10, 7, 1, 6, 14, 12, 0, 1, 9, 11),
nrow = 4, ncol = 4, byrow = TRUE, dimnames = list(c("< 15,000",
"15,000 - 25,000", "25,000 - 40,000", "> 40,000"), c("VD", "LD", "MS",
"VS")))
The following code is for calculating gamma value, which is consistent
with the result presented in section 2.4.5 of that book.
C <- 0
D <- 0
for (i in 1:nrow(job)){
for (j in 1:ncol(job)){
pi_c <- 0
pi_d <- 0
for (h in 1:nrow(job)){
for (k in 1:ncol(job)){
if ((h > i & k > j) | (h < i & k < j)){
pi_c <- pi_c + job[h, k]/sum(job)
}
if ((h > i & k < j) | (h < i & k > j)){
pi_d <- pi_d + job[h, k]/sum(job)
}
}
}
C <- C + job[i, j] * pi_c
D <- D + job[i, j] * pi_d
}
}
gamma <- (C - D) / (C + D) # gamma = 0.221 for this example.
The following code is for calculating stardard error of gamma.
sigma.squared <- 0
for (i in 1:nrow(job)){
for (j in 1:ncol(job)){
pi_c <- 0
pi_d <- 0
for (h in 1:nrow(job)){
for (k in 1:ncol(job)){
if ((h > i & k > j) | (h < i & k < j)){
pi_c <- pi_c + job[h, k]/sum(job)
}
if ((h > i & k < j) | (h < i & k > j)){
pi_d <- pi_d + job[h, k]/sum(job)
}
}
}
phi <- 4 * (pi_c * D - pi_d * C) / (C + D)^2
sigma.squared <- sigma.squared + phi^2
}
}
se <- (sigma.squared/sum(job))^.5 # 0.00748, which is different from
the SE 0.117 given in section 3.4.3 of that book.
I am not able to figure out what is the problem with my code... Could
anyone point out what the problem is?
Thanks.
Wuming
Save your time (and much execution time) by using the Hmisc package's rcorr.cens function with the argument outx=TRUE. rcorr.cens using a standard U-statistic variance estimator.
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University