cut ()

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121231/92c931b4/attachment.pl>
At Mon, 31 Dec 2012 22:25:25 +0000,
The issue is that, for Utah, I am getting an <NA> instead of (42,48.7] in the ob_mrj_cat column.
The problem is likely due to comparisons of floating point numbers.
Try moving your lower and upper bounds out a tiny bit.  When I add

  c(-1e-8, 0, 0, 0, 0, 1e8)

to the result of quantile, I don't get any NAs.

Neal
A misplaced right parenthesis caused the problem:

p1_st_data$ob_mrj_cat <- cut (p1_st_data$obt_mrj_p, quantile
(p1_st_data$obt_mrj_p, (0:5/5), include.lowest=TRUE)) 

Should be

p1_st_data$ob_mrj_cat <- cut (p1_st_data$obt_mrj_p, quantile
(p1_st_data$obt_mrj_p, (0:5/5)), include.lowest=TRUE)

---------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
Sent: Monday, December 31, 2012 4:25 PM
To: R help
Subject: [R] cut ()

Hello List,

My goal is to create a 5 category variable (p1_st_data$ob_mrj_cat),
based on the p1_st_data$obt_mrj_p variable, using the following code
for 50 States and District of Columbia (N=51).

p1_st_data$ob_mrj_cat <- cut (p1_st_data$obt_mrj_p, quantile
(p1_st_data$obt_mrj_p, (0:5/5), include.lowest=TRUE))

The issue is that, for Utah, I am getting an <NA> instead of (42,48.7]
in the ob_mrj_cat column.

Is there a way to tweak the code (i.e., programmatically) to resolve
the issue?

I would appreciate receiving your help.

Happy New Year and Best Wishes to R Expert-members, who have been so
kind and helpful to beginner R users like me.

Thanks and regards,

Pradip Muhuri

##########################  console followed the reproducible example
#######
table(p1_st_data$ob_mrj_cat)
  (42,48.7] (48.7,50.9] (50.9,52.8] (52.8,54.2] (54.2,58.7]
         10          10          10          10          10

p1_st_data [p1_st_data$state =="Utah",] [, 1:4]
   state obt_mrj_p obt_mrj_se ob_mrj_cat
45  Utah        42       1.49       <NA>    # I expected this to be
(42,48.7] instead of <NA>.

### The Reproducible Example (data and code) is shown below:

#read estimates of risk factors for substances use (ages 12-17) by
State obtained from SUDAAN output
p1_st_data <-read.table (text="
Alabama,  49.60,               1.37
Alaska,  55.00,    1.41
Arizona,  52.50, 1.56
Arkansas,            50.50,    1.22
California,            51.10,    0.65
Colorado,            55.10,    1.26
Connecticut,      56.30,    1.28
Delaware,           53.60,    1.30
District of Columbia,  53.50,         1.22
Florida,  52.70,   0.67
Georgia,               52.50,    1.15
Hawaii, 49.40,    1.33
Idaho,   48.30,    1.23
Illinois,  52.70,    0.63
Indiana,                49.60,    1.16
Iowa,     46.30,    1.37
Kansas, 44.30,    1.43
Kentucky,            52.90,    1.37
Louisiana,            49.70,    1.23
Maine,  55.60,    1.44
Maryland,           53.90,    1.46
Massachusetts,                55.40,    1.41
Michigan,            52.40,    0.62
Minnesota,         51.50,    1.20
Mississippi,         43.20,    1.14
Missouri,             48.70,    1.20
Montana,            56.40,    1.16
Nebraska,           45.70,    1.51
Nevada,               54.20,    1.17
New Hampshire,              56.10,    1.30
New Jersey,       53.20,    1.45
New Mexico,     57.60,    1.34
New York,           53.70,    0.67
North Carolina, 52.20,    1.26
North Dakota,   48.60,    1.34
Ohio,     50.90,    0.61
Oklahoma,          47.20,    1.42
Oregon,               54.00,    1.35
Pennsylvania,    53.00,    0.63
Rhode Island,    57.20,    1.20
South Carolina, 50.50,    1.21
South Dakota,   43.40,    1.30
Tennessee,        48.90,    1.35
Texas,   48.70,    0.62
Utah,     42.00,    1.49
Vermont,            58.70,    1.24
Virginia,                51.80,    1.18
Washington,      53.50,    1.39
West Virginia,    52.80,    1.07
Wisconsin,          49.90,    1.50
Wyoming,           49.20,    1.29",
sep=  "," , col.names = c("state" ,   "Obt_mrj_p" ,  "Obt_mrj_se" ),
colClasses = c( "character" ,  "numeric" , "numeric" )
)

#change the names to lower cases
names(p1_st_data) <- tolower (names(p1_st_data))

# cerate five equal-sized groups for the perceived ease of obtaining
marijuana variable
p1_st_data$ob_mrj_cat <- cut (p1_st_data$obt_mrj_p, quantile
(p1_st_data$obt_mrj_p, (0:5/5), include.lowest=TRUE))

p1_st_data
dim (p1_st_data)
table(p1_st_data$ob_mrj_cat)
p1_st_data [p1_st_data$state =="Utah",] [, 1:4]

Pradip K. Muhuri, PhD
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail:
Pradip.Muhuri at samhsa.hhs.gov<mailto:Pradip.Muhuri at samhsa.hhs.gov>

The Center for Behavioral Health Statistics and Quality your feedback.
Please click on the following link to complete a brief customer survey:
http://cbhsqsurvey.samhsa.gov<http://cbhsqsurvey.samhsa.gov/>

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.