Skip to content

svyhist

9 messages · Muhuri, Pradip (SAMHSA/CBHSQ), David Winsemius, Anthony Damico +1 more

#
On Oct 5, 2012, at 3:33 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

            
This is the top of the output of str applied to the data argument you offered to svyhist:
List of 9
 $ cluster   :'data.frame':	0 obs. of  1 variable:
  ..$ psu: Factor w/ 47 levels "109.1","115.2",..: 
  ..- attr(*, "terms")=Classes 'terms', 'formula' length 2 ~psu
  .. .. ..- attr(*, "variables")= language list(psu)
  .. .. ..- attr(*, "factors")= int [1, 1] 1
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr "psu"
  .. .. .. .. ..$ : chr "psu"

At least one problem seems pretty clear. No data. That can be corrected by wrapping as.numeric() around the factor on which you are subsetting in two places.

Another problem may arise when you restrict to one class only, namely there won't any "design" to work with. All the clusters .... there would be only one ....  no longer have any multiplicity,  and svyhist apparently isn't built to handle situation, at least with that design argument.

Error in onestrat(x[index, , drop = FALSE], clusters[index], nPSU[index][1],  : 
  Stratum (2) has only one PSU at stage 1

Taking the 'stratum' argument out of the design() spec allows it to proceed, but I do not know if that is introducing invalidity in the analysis.
#
Dear Anthony and David,

Thank you so much for your comments and suggestions! 

The sample data set I had embedded in the earlier-sent R script was intended to be used for the reproducible example.

Now I have  used Anthony's revised code on the the entire analytic file.  The code has worked fine.  Thanks, again.

Attached are the 2 .png files.

The only problem I see is that the y-lim in these 2 plots  is not exactly the same.  I have tried this: ylim = c(0,0.005, 0.010, 0.015, 0.020, 0.025, 0.030), which 
did not work. Any thoughts?


Pradip Muhuri

#######################################  Revised Code #######################

setwd ("E:/RDATA")
options(width = 120)
library (survey)
library (KernSmooth)
library (Hmisc)
load("tor.rdata")
#contents (tor)


# Grouping variable (xspd) to be  factor 
tor <- within(tor, {
         xspd2 <- factor(xspd2,levels=c (1,2),
                         labels=c('SPD', 'No SPD'), ordered=TRUE)            
                   } 
              )
# object with survey design variables and data 
nhis <- svydesign (id=~psu,strat=~stratum, weights=~wt8, data=tor, nest=TRUE)

MyBreaks <- c(18,35,45,55,65,75,85,95)

png("svyhist_no_spd_age_at_death.png")
options( survey.lonely.psu = "adjust" )
svyhist (~dthage,
         subset (nhis, mortstat==1 & xspd2=='No SPD'), breaks=MyBreaks,
         main= " ",
         col="grey80", xlab="Age at Death among those Who had SPD"
         )
lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)


dev.off ()


png("svyhist_spd_age_at_death.png")
options( survey.lonely.psu = "adjust" )
svyhist (~dthage,
         subset (nhis, mortstat==1 & xspd2=='SPD'), breaks=MyBreaks,
          #ylim = c(0,0.005, 0.010, 0.015, 0.020, 0.025, 0.030),
         main= " ",
         col="grey80",
         xlab="Age at Death among those Who had no SPD Distribution"
         )
lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)


dev.off ()

##########################################################################################
#
Dear Anthony and David,

Sorry- the earlier-sent plots were mislabeled, which I have corrected and attached.  But, the y-lim issue is yet to be resolved. 

Thanks,

Pradip Muhuri
#
Hi Anthony,

The ylim () has been added to the code (please see below), and I got 4 plots that have the same y -dimension. 

Each plot displays 2 distributions - one as histogram from the data and another one as line (i.e., idealized theoretical normal distribution?).  

My question is, "Is there way to change the distribution in the line () function and try other theoretical distribution to approximate the observed distribution?"


Thanks,

Pradip Muhuri


########################################

MyBreaks <- c(18,35,45,55,65,75,85,95)

png("svyhist_no_spd_age_at_inteview.png")
options( survey.lonely.psu = "adjust" )
svyhist (~age_p,
         subset (nhis, xspd2=='No SPD'), breaks=MyBreaks,
         ylim = c(0,0.035),
         main= " ",
         col="grey80", xlab="Age at Interview among those Who had no SPD"
         )

lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)

dev.off ()
1 day later
#
The line isn't a theoretical distribution, it's a kernel density
estimator and so should match your histogram.

It looks as though you use exactly the same call
   lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)
for each plot, which gives a smooth curve estimating the age at death
for people with No SPD.

To get, eg, age at interview for the SPD group use something like:
 lines (svysmooth(~age_p, bandwidth=5,subset(nhis, xspd2=='SPD')), lwd=2)


   -thomas

On Sun, Oct 7, 2012 at 2:19 PM, Muhuri, Pradip (SAMHSA/CBHSQ)
<Pradip.Muhuri at samhsa.hhs.gov> wrote:

  
    
#
Thomas,

Sorry about my repeat typo in the line () function, which caused the distortion of the line that did not match in the earlier graph.  The revised code gives me the graphs that look lot better (please see the attachment).  Thank you for catching that mistake and also for providing clarification regarding the kernel density estimator.

Pradip Muhuri