Prev 361882 / 398506 Next

svykappa using the survey package

Muhuri, Pradip (AHRQ/CFACT)

Mon, Jun 20, 2016 4:49 PM

Hello,

My goal is to calculate the weighted kappa measure of agreement between two factors  using the R  survey package.  I am getting the following error message (the console is appended below; sorry no data provided).

Error in names(probs) <- nms : 
  'names' attribute [15] must be the same length as the vector [8]

I have followed the following major steps:

1) Used the "haven" package to read the sas data set into R.
2) Used the dplyr mutate() to create 2 new variables and converted to factors [required for the svykappa()?].
3) Created an object (named design) using the survey design variables and the data file.
4) Used the svykappa() to compute the kappa measure of agreement. 

I will appreciate if someone could give me hints on how to resolve the issue.

Thanks,

Pradip Muhuri

###############  The detailed console is appended below  ####################

+   names(df) <- tolower(names(df))
+   df
+ }

+                          xbpchek53 = ifelse(bpchek53 ==1, 1,
+                             ifelse(bpchek53 %in% 2:6, 2,NA)), 
+                          xcholck53 = ifelse(cholck53 ==1, 1,
+                            ifelse(cholck53 %in% 2:6, 2,NA)))

[1] TRUE

[1] TRUE

xbpchek53
bpchek53     1     2   Sum
     -9      0     0     0
     -8      0     0     0
     -7      0     0     0
     -1      0     0     0
     1   19778     0 19778
     2       0  2652  2652
     3       0  1014  1014
     4       0   538   538
     5       0   737   737
     6       0   623   623
     Sum 19778  5564 25342

xcholck53
cholck53     1     2   Sum
     -9      0     0     0
     -8      0     0     0
     -7      0     0     0
     -1      0     0     0
     1   14850     0 14850
     2       0  3153  3153
     3       0  1170  1170
     4       0   696   696
     5       0   909   909
     6       0  3764  3764
     Sum 14850  9692 24542

xcholck53
xbpchek53     1     2   Sum
      1   14667  4379 19046
      2     163  5225  5388
      Sum 14830  9604 24434

Error in names(probs) <- nms : 
  'names' attribute [15] must be the same length as the vector [8]

#################################################################

Pradip K. Muhuri,  AHRQ/CFACT
 5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564




-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Muhuri, Pradip (AHRQ/CFACT)
Sent: Thursday, June 16, 2016 2:06 PM
To: David Winsemius
Cc: r-help at r-project.org
Subject: Re: [R] dplyr's arrange function - 3 solutions received - 1 New Question

Hello David,

Your revisions to the earlier code have given me desired results.

library("gtools")
mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c("indicator", "prevalence_c")  ]

Thanks,

Pradip


Pradip K. Muhuri,  AHRQ/CFACT
 5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564





-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net] 
Sent: Thursday, June 16, 2016 12:54 PM
To: Muhuri, Pradip (AHRQ/CFACT)
Cc: r-help at r-project.org
Subject: Re: [R] dplyr's arrange function - 3 solutions received - 1 New Question

Try instead just a vector of names for the second argument to "["

 mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), 
         c("indicator", "prevalence_c") ]

Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = TRUE),  : 
 undefined columns selected

********************

str(mydata)

Classes 'tbl_df', 'tbl' and 'data.frame':	10 obs. of  10 variables:
$ indicator   : chr  "1. Health check-up" "2. Blood cholesterol checked " "3. Recieved flu vaccine" "4. Blood pressure checked" ...
$ subgroup    : chr  "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ ...
$ n           : num  2117 2127 2124 2135 1027 ...
$ prevalence_c: chr  "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" ...
$ prevalence_p: chr  "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" ...
$ sensitivity : chr  "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" ...
$ specificity : chr  "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" ...
$ ppv         : chr  "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" ...
$ npv         : chr  "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" ...
$ kappa       : chr  "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 (0.035)" ...

Pradip K. Muhuri,  AHRQ/CFACT
5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564




-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel 
Nordlund
Sent: Wednesday, June 15, 2016 6:37 PM
To: r-help at r-project.org
Subject: Re: [R] dplyr's arrange function

On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:

Hello,

I am using the dplyr's arrange() function to sort  one of the  many data frames  on a character variable (named "prevalence").

Issue: I am not getting the desired output  (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric.

The reproducible example and the output are appended below.

Is there any work-around  to convert/treat  this character variable (named "prevalence" in the data frame below)  as numeric before using the arrange() function within the dplyr package?

Any hints will be appreciated.

Thanks,

Pradip Muhuri

# Reproducible Example

library("readr")
testdata <- read_csv(
"indicator,  prevalence
1. Health check-up, 77.2 (1.19)
2. Blood cholesterol checked,  84.5 (1.14) 3. Recieved flu vaccine,
50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin 
use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7. 
Sigmoidoscopy,
6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram,  72.6 (1.82) 
10. Pap Smear test, 73.3 (2.37)")

# Sort on the character variable in descending order 
arrange(testdata,
desc(prevalence))

# Results from Console

                     indicator  prevalence
                         (chr)       (chr)
1     4. Blood pressure checked 88.7 (0.88)
2  2. Blood cholesterol checked 84.5 (1.14)
3            1. Health check-up 77.2 (1.19)
4            10. Pap Smear test 73.3 (2.37)
5                   9.Mammogram 72.6 (1.82)
6                 6.Colonoscopy 60.2 (1.41)
7              7. Sigmoidoscopy  6.1 (0.61)
8       3. Recieved flu vaccine 50.0 (1.33)
9           8. Blood stool test 14.6 (1.00)
10      5. Aspirin use-problems 11.7 (1.02)


Pradip K. Muhuri,  AHRQ/CFACT
5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564

The problem is that you are sorting a character variable.

testdata$prevalence

 [1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7 (1.02)"
 [6] "60.2 (1.41)" "6.1 (0.61)"  "14.6 (1.00)" "72.6 (1.82)" "73.3 (2.37)"

Notice that the 7th element is "6.1 (0.61)".  The first CHARACTER is a "6", so it is going to sort BEFORE the "50.0 (1.33)" (in descending order).  If you want the character value of line 7 to sort last, it would need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the leading space).

Hope this is helpful,

Dan

Daniel Nordlund
Port Townsend, WA USA

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Thread (2 messages)

Muhuri, Pradip (AHRQ/CFACT) svykappa using the survey package Jun 20 Anthony Damico svykappa using the survey package Jun 20