Hello,
My goal is to calculate the weighted kappa measure of agreement between
two factors using the R survey package. I am getting the following error
message (the console is appended below; sorry no data provided).
# calculate survey Kappa
svykappa(~xbpchek53+xcholck53, design)
Error in names(probs) <- nms :
'names' attribute [15] must be the same length as the vector [8]
I have followed the following major steps:
1) Used the "haven" package to read the sas data set into R.
2) Used the dplyr mutate() to create 2 new variables and converted to
factors [required for the svykappa()?].
3) Created an object (named design) using the survey design variables and
the data file.
4) Used the svykappa() to compute the kappa measure of agreement.
I will appreciate if someone could give me hints on how to resolve the
issue.
Thanks,
Pradip Muhuri
############### The detailed console is appended below
####################
setwd ("U:/A_PSAQ")
library(haven)
library(dplyr)
library(survey)
library(srvyr)
library(Hmisc)
my_hc2013_data <- read_sas("pc2013.sas7bdat")
# Function to convert var names in upper cases to var names in lower
+ names(df) <- tolower(names(df))
+ df
+ }
my_hc2013_data <- lower(my_hc2013_data)
# Check the contents - Hmisc package (as above) required
# contents(my_hc2013_data)
# create two new variables
my_hc2013_data <- mutate(my_hc2013_data,
+ xbpchek53 = ifelse(bpchek53 ==1, 1,
+ ifelse(bpchek53 %in% 2:6, 2,NA)),
+ xcholck53 = ifelse(cholck53 ==1, 1,
+ ifelse(cholck53 %in% 2:6, 2,NA)))
# convert the numeric variables to factors for the kappa measure
my_hc2013_data$xbpchek53 <- as.factor(my_hc2013_data$xbpchek53)
my_hc2013_data$xcholck53 <- as.factor(my_hc2013_data$xcholck53)
# check whether the variables are factors
is.factor(my_hc2013_data$xbpchek53)
is.factor(my_hc2013_data$xcholck53)
# check the data from the cross table
addmargins(with(my_hc2013_data, table(bpchek53,xbpchek53 )))
xbpchek53
bpchek53 1 2 Sum
-9 0 0 0
-8 0 0 0
-7 0 0 0
-1 0 0 0
1 19778 0 19778
2 0 2652 2652
3 0 1014 1014
4 0 538 538
5 0 737 737
6 0 623 623
Sum 19778 5564 25342
addmargins(with(my_hc2013_data, table(cholck53,xcholck53 )))
xcholck53
cholck53 1 2 Sum
-9 0 0 0
-8 0 0 0
-7 0 0 0
-1 0 0 0
1 14850 0 14850
2 0 3153 3153
3 0 1170 1170
4 0 696 696
5 0 909 909
6 0 3764 3764
Sum 14850 9692 24542
addmargins(with(my_hc2013_data, table(xbpchek53,xcholck53 )))
xcholck53
xbpchek53 1 2 Sum
1 14667 4379 19046
2 163 5225 5388
Sum 14830 9604 24434
# create an object with design variables and data
design<-svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f,
data=my_hc2013_data, nest=TRUE)
# calculate survey Kappa
svykappa(~xbpchek53+xcholck53, design)
Error in names(probs) <- nms :
'names' attribute [15] must be the same length as the vector [8]
#################################################################
Pradip K. Muhuri, AHRQ/CFACT
5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Muhuri,
Pradip (AHRQ/CFACT)
Sent: Thursday, June 16, 2016 2:06 PM
To: David Winsemius
Cc: r-help at r-project.org
Subject: Re: [R] dplyr's arrange function - 3 solutions received - 1 New
Question
Hello David,
Your revisions to the earlier code have given me desired results.
library("gtools")
mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c("indicator",
"prevalence_c") ]
Thanks,
Pradip
Pradip K. Muhuri, AHRQ/CFACT
5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564
-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net]
Sent: Thursday, June 16, 2016 12:54 PM
To: Muhuri, Pradip (AHRQ/CFACT)
Cc: r-help at r-project.org
Subject: Re: [R] dplyr's arrange function - 3 solutions received - 1 New
Question
On Jun 16, 2016, at 6:12 AM, Muhuri, Pradip (AHRQ/CFACT) <
Pradip.Muhuri at ahrq.hhs.gov> wrote:
Hello,
I got 3 solutions to my earlier code. Thanks to the contributors. May
I bring your attention to a new question below (with respect to David's
solution)?
1) Thanks to Daniel Nordlund for the tips - replacing leading space
2) Thanks to David Winsemius for his solution with the
gtools::mixedorder function. I have added an argument to his.
mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), ]
3) Thanks to Jim Lemon's for his solution. I have prepended a minus
sign to reverse the order.
numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c),"
"),"[",1)) mydata[order(-numprev), ]
(New)Question for solution 2:
I want to keep only 2 variables (say, indicator and prevalence_c) in
the output. Where to insert the additional code? Why does the following
code fail?
mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE),
c(mydata$indicator, mydata$prevalence_c) ]
Try instead just a vector of names for the second argument to "["
mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE),
c("indicator", "prevalence_c") ]
Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c,
undefined columns selected
********************
Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 10 variables:
$ indicator : chr "1. Health check-up" "2. Blood cholesterol checked
" "3. Recieved flu vaccine" "4. Blood pressure checked" ...
$ subgroup : chr "Both sexes, ages =35 yrs""| __truncated__ "Both
sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""|
__truncated__ "Both sexes, ages =35 yrs""| __truncated__ ...
$ n : num 2117 2127 2124 2135 1027 ...
$ prevalence_c: chr "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2
$ prevalence_p: chr "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7
$ sensitivity : chr "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0
$ specificity : chr "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7
$ ppv : chr "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3
$ npv : chr "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5
$ kappa : chr "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)"
Pradip K. Muhuri, AHRQ/CFACT
5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel
Nordlund
Sent: Wednesday, June 15, 2016 6:37 PM
To: r-help at r-project.org
Subject: Re: [R] dplyr's arrange function
On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:
Hello,
I am using the dplyr's arrange() function to sort one of the many
data frames on a character variable (named "prevalence").
Issue: I am not getting the desired output (line 7 is the problem,
which should be the very last line in the sorted data frame) because the
sorted field is character, not numeric.
The reproducible example and the output are appended below.
Is there any work-around to convert/treat this character variable
(named "prevalence" in the data frame below) as numeric before using the
arrange() function within the dplyr package?
Any hints will be appreciated.
Thanks,
Pradip Muhuri
# Reproducible Example
library("readr")
testdata <- read_csv(
"indicator, prevalence
1. Health check-up, 77.2 (1.19)
2. Blood cholesterol checked, 84.5 (1.14) 3. Recieved flu vaccine,
50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin
use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7.
Sigmoidoscopy,
6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram, 72.6 (1.82)
10. Pap Smear test, 73.3 (2.37)")
# Sort on the character variable in descending order
arrange(testdata,
desc(prevalence))
# Results from Console
indicator prevalence
(chr) (chr)
1 4. Blood pressure checked 88.7 (0.88)
2 2. Blood cholesterol checked 84.5 (1.14)
3 1. Health check-up 77.2 (1.19)
4 10. Pap Smear test 73.3 (2.37)
5 9.Mammogram 72.6 (1.82)
6 6.Colonoscopy 60.2 (1.41)
7 7. Sigmoidoscopy 6.1 (0.61)
8 3. Recieved flu vaccine 50.0 (1.33)
9 8. Blood stool test 14.6 (1.00)
10 5. Aspirin use-problems 11.7 (1.02)
Pradip K. Muhuri, AHRQ/CFACT
5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564
The problem is that you are sorting a character variable.
[1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7
[6] "60.2 (1.41)" "6.1 (0.61)" "14.6 (1.00)" "72.6 (1.82)" "73.3
Notice that the 7th element is "6.1 (0.61)". The first CHARACTER is a
"6", so it is going to sort BEFORE the "50.0 (1.33)" (in descending
order). If you want the character value of line 7 to sort last, it would
need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the leading space).
Hope this is helpful,
Dan
Daniel Nordlund
Port Townsend, WA USA