Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = TRUE), :
undefined columns selected
********************
Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 10 variables:
$ indicator : chr "1. Health check-up" "2. Blood cholesterol checked " "3. Recieved flu vaccine" "4. Blood pressure checked" ...
$ subgroup : chr "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ ...
$ n : num 2117 2127 2124 2135 1027 ...
$ prevalence_c: chr "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" ...
$ prevalence_p: chr "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" ...
$ sensitivity : chr "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" ...
$ specificity : chr "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" ...
$ ppv : chr "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" ...
$ npv : chr "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" ...
$ kappa : chr "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 (0.035)" ...
Pradip K. Muhuri, AHRQ/CFACT
5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel
Nordlund
Sent: Wednesday, June 15, 2016 6:37 PM
To: r-help at r-project.org
Subject: Re: [R] dplyr's arrange function
On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:
Hello,
I am using the dplyr's arrange() function to sort one of the many data frames on a character variable (named "prevalence").
Issue: I am not getting the desired output (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric.
The reproducible example and the output are appended below.
Is there any work-around to convert/treat this character variable (named "prevalence" in the data frame below) as numeric before using the arrange() function within the dplyr package?
Any hints will be appreciated.
Thanks,
Pradip Muhuri
# Reproducible Example
library("readr")
testdata <- read_csv(
"indicator, prevalence
1. Health check-up, 77.2 (1.19)
2. Blood cholesterol checked, 84.5 (1.14) 3. Recieved flu vaccine,
50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin
use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7.
Sigmoidoscopy,
6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram, 72.6 (1.82)
10. Pap Smear test, 73.3 (2.37)")
# Sort on the character variable in descending order
arrange(testdata,
desc(prevalence))
# Results from Console
indicator prevalence
(chr) (chr)
1 4. Blood pressure checked 88.7 (0.88)
2 2. Blood cholesterol checked 84.5 (1.14)
3 1. Health check-up 77.2 (1.19)
4 10. Pap Smear test 73.3 (2.37)
5 9.Mammogram 72.6 (1.82)
6 6.Colonoscopy 60.2 (1.41)
7 7. Sigmoidoscopy 6.1 (0.61)
8 3. Recieved flu vaccine 50.0 (1.33)
9 8. Blood stool test 14.6 (1.00)
10 5. Aspirin use-problems 11.7 (1.02)
Pradip K. Muhuri, AHRQ/CFACT
5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564
The problem is that you are sorting a character variable.
[1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7 (1.02)"
[6] "60.2 (1.41)" "6.1 (0.61)" "14.6 (1.00)" "72.6 (1.82)" "73.3 (2.37)"
Notice that the 7th element is "6.1 (0.61)". The first CHARACTER is a "6", so it is going to sort BEFORE the "50.0 (1.33)" (in descending order). If you want the character value of line 7 to sort last, it would need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the leading space).
Hope this is helpful,
Dan
Daniel Nordlund
Port Townsend, WA USA