svyhist
The line isn't a theoretical distribution, it's a kernel density estimator and so should match your histogram. It looks as though you use exactly the same call lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2) for each plot, which gives a smooth curve estimating the age at death for people with No SPD. To get, eg, age at interview for the SPD group use something like: lines (svysmooth(~age_p, bandwidth=5,subset(nhis, xspd2=='SPD')), lwd=2) -thomas On Sun, Oct 7, 2012 at 2:19 PM, Muhuri, Pradip (SAMHSA/CBHSQ)
<Pradip.Muhuri at samhsa.hhs.gov> wrote:
Hi Anthony,
The ylim () has been added to the code (please see below), and I got 4 plots that have the same y -dimension.
Each plot displays 2 distributions - one as histogram from the data and another one as line (i.e., idealized theoretical normal distribution?).
My question is, "Is there way to change the distribution in the line () function and try other theoretical distribution to approximate the observed distribution?"
Thanks,
Pradip Muhuri
########################################
MyBreaks <- c(18,35,45,55,65,75,85,95)
png("svyhist_no_spd_age_at_inteview.png")
options( survey.lonely.psu = "adjust" )
svyhist (~age_p,
subset (nhis, xspd2=='No SPD'), breaks=MyBreaks,
ylim = c(0,0.035),
main= " ",
col="grey80", xlab="Age at Interview among those Who had no SPD"
)
lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)
dev.off ()
________________________________________
From: Anthony Damico [ajdamico at gmail.com]
Sent: Saturday, October 06, 2012 6:56 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: David Winsemius; R help
Subject: Re: [R] svyhist
?ylim says "numeric vectors of length 2" - so just the beginning and end.
?svyhist doesn't specifically mention the ylim parameter, meaning you should look for a "..." in the arguments list and click through to the page for ?hist
?hist has an example that shows the ylim parameter only containing the beginning and end values.
try using
ylim = c( 0 , 0.030 )
if you're looking to set the tick marks, look at ?axis ;)
On Fri, Oct 5, 2012 at 11:18 PM, Muhuri, Pradip (SAMHSA/CBHSQ) <Pradip.Muhuri at samhsa.hhs.gov<mailto:Pradip.Muhuri at samhsa.hhs.gov>> wrote:
Dear Anthony and David,
Sorry- the earlier-sent plots were mislabeled, which I have corrected and attached. But, the y-lim issue is yet to be resolved.
Thanks,
Pradip Muhuri
________________________________________
From: Anthony Damico [ajdamico at gmail.com<mailto:ajdamico at gmail.com>]
Sent: Friday, October 05, 2012 7:29 PM
To: David Winsemius
Cc: Muhuri, Pradip (SAMHSA/CBHSQ); R help
Subject: Re: [R] svyhist
this worked for me -- and doesn't require removing the PSUs from the design :)
options( survey.lonely.psu = "adjust" )
svyhist (~dthage,
subset (nhis, xspd2=='No SPD'), breaks=MyBreaks, main= " ",
col="grey80",
xlab="Age at Death Distribution"
)
lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)
Dr. Lumley has written quite a bit about single-PSU strata here: http://faculty.washington.edu/tlumley/survey/exmample-lonely.html
On Fri, Oct 5, 2012 at 7:16 PM, David Winsemius <dwinsemius at comcast.net<mailto:dwinsemius at comcast.net><mailto:dwinsemius at comcast.net<mailto:dwinsemius at comcast.net>>> wrote:
On Oct 5, 2012, at 3:33 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
Hello,
I was trying to draw histograms of age at death and got the following 2 error messages:
1) Error in tapply(1:NROW(x), list(factor(strata)), function(index) { :
arguments must have same length
This is the top of the output of str applied to the data argument you offered to svyhist:
str(subset (nhis, xspd2==2) )
List of 9
$ cluster :'data.frame': 0 obs. of 1 variable:
..$ psu: Factor w/ 47 levels "109.1","115.2",..:
..- attr(*, "terms")=Classes 'terms', 'formula' length 2 ~psu
.. .. ..- attr(*, "variables")= language list(psu)
.. .. ..- attr(*, "factors")= int [1, 1] 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr "psu"
.. .. .. .. ..$ : chr "psu"
At least one problem seems pretty clear. No data. That can be corrected by wrapping as.numeric() around the factor on which you are subsetting in two places.
Another problem may arise when you restrict to one class only, namely there won't any "design" to work with. All the clusters .... there would be only one .... no longer have any multiplicity, and svyhist apparently isn't built to handle situation, at least with that design argument.
Error in onestrat(x[index, , drop = FALSE], clusters[index], nPSU[index][1], :
Stratum (2) has only one PSU at stage 1
Taking the 'stratum' argument out of the design() spec allows it to proceed, but I do not know if that is introducing invalidity in the analysis.
--
David.
2) Error in findInterval(mm[, i], gx) : 'vec' contains NAs
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
I would appreciate if someone could help me resolve these issues.
Below is reproducible example.
Thanks,
Pradip Muhuri
setwd ("E:/RDATA")
options(width = 120)
library (survey)
library (KernSmooth)
xd1 <-
"dthage ypll_75 xspd2 psu stratum wt8
56 19 2 2 33 1512.7287
86 0 2 2 129 1830.6400
81 0 2 1 67 536.1400
47 28 2 1 17 519.8350
71 4 1 1 225 254.4087
72 3 1 1 238 424.4787
75 0 2 2 115 407.0987
83 0 2 2 46 622.5137
79 -4 2 1 300 509.1212
78 -3 2 1 133 517.3325
71 4 2 2 328 1179.3063
64 11 2 1 2 301.5250
78 -3 2 1 62 253.9025
65 10 2 2 260 932.6575
75 0 2 1 247 145.5900
63 12 2 2 156 247.0650
71 4 2 1 146 829.4787
76 -1 2 2 234 432.5437
76 0 2 1 109 859.6888
68 7 2 1 228 1236.2975
64 11 2 2 167 347.5788
62 13 2 2 312 354.0500
77 0 2 2 275 882.1938
78 -3 2 1 28 481.5975
81 0 2 1 180 1285.5425
79 0 2 2 205 576.0000
70 5 2 1 173 128.3725
75 0 2 2 189 359.3863
78 0 2 1 332 512.8062
74 1 2 2 14 449.0800
77 0 2 1 242 283.0013
92 0 2 1 152 915.3200
69 6 2 2 217 672.7663
53 22 2 1 290 1430.8812
81 0 2 2 90 699.1075
67 8 2 2 316 607.6500
85 0 2 1 171 312.9850
93 0 2 2 119 936.1275
82 0 2 1 118 186.4450
71 4 2 2 329 729.1213
43 32 2 1 215 887.6313
74 1 2 1 180 569.9338
89 0 2 1 324 1054.0887
81 0 2 2 47 532.0987
70 5 2 1 53 450.8750
75 0 1 1 38 557.9750
56 19 2 1 17 512.6363
90 0 2 2 29 569.7888
70 5 2 1 251 554.2138
56 19 2 2 14 1114.1762"
tor <- read.table (textConnection(xd1), header=TRUE, sep='', as.is<http://as.is><http://as.is>=TRUE)
# Grouping variable (xspd) to be factor
tor <- within(tor, {
xspd2 <- factor(xspd2,levels=c (1,2),
labels=c('SPD', 'No SPD'), ordered=TRUE)
}
)
# object with survey design variables and data
nhis <- svydesign (id=~psu,strat=~stratum, weights=~wt8, data=tor, nest=TRUE)
MyBreaks <- c(18,35,45,55,65,75,85,95)
png("svyhist_age_at_death.png")
svyhist (~dthage,
subset (nhis, xspd2==2), breaks=MyBreaks, main= " ",
col="grey80",
xlab="Age at Death Distribution"
)
lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2==2)), lwd=2)
dev.off ()
Pradip K. Muhuri
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260
e-mail: Pradip.Muhuri at samhsa.hhs.gov<mailto:Pradip.Muhuri at samhsa.hhs.gov><mailto:Pradip.Muhuri at samhsa.hhs.gov<mailto:Pradip.Muhuri at samhsa.hhs.gov>><mailto:Pradip.Muhuri at samhsa.hhs.gov<mailto:Pradip.Muhuri at samhsa.hhs.gov><mailto:Pradip.Muhuri at samhsa.hhs.gov<mailto:Pradip.Muhuri at samhsa.hhs.gov>>>
The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov<http://cbhsqsurvey.samhsa.gov/>
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org><mailto:R-help at r-project.org<mailto:R-help at r-project.org>> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Alameda, CA, USA
______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org><mailto:R-help at r-project.org<mailto:R-help at r-project.org>> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Thomas Lumley Professor of Biostatistics University of Auckland