Skip to content

svymean using the survey package - strata containing no subpopulation members

2 messages · Muhuri, Pradip (AHRQ/CFACT), Anthony Damico

#
Hi,

Below is a reproducible example that produces the estimate of "totexp13" (total health care expenditure 2013) for the subpopulation that includes "Asians with diabetes diagnosed" in MEPS. The R script below downloads file from the web for processing.

Issue/Question: The R/survey package does not seem to provide a NOTE regarding the number of strata containing NO SUBOPOPULATION MEMBERS (in this case - Asians with diabetes diagnosed in MEPS 2013). Is there a way to get this count or ask R to provide this information?  Any hints will be appreciated.

Acknowledgements:   The current R script is a tweaked-version of the code originally sent (on this forum) by Anthony Damico for another application.  Thanks to Anthony!

Good news: The estimate is almost the same as the estimates obtained from SAS, SUDAAN and STATA runs.

Additional Information:  STATA provides a NOTE that " 84 strata omitted because no subpopulation members".
SAS LOG (proc surveymeans) provides a NOTE that "Only one cluster in a stratum in domain Asian_with_diab for variable(s) TOTEXP13. The estimate of variance for TOTEXP13 will
      omit this stratum".


Thanks,

Pradip Muhuri




library(foreign)
library(survey)
library(dplyr)

tf <- tempfile()

download.file( "https://meps.ahrq.gov/mepsweb/data_files/pufs/h163ssp.zip", tf , mode = 'wb' )

z <- unzip( tf , exdir = tempdir() )

x <- read.xport( z )

names( x ) <- tolower( names( x ) )

mydata <- select(x, varstr, varpsu, perwt13f, diabdx, totexp13, racethx)

mydata[mydata <=0] <- NA

design <- svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f, data=x, nest=TRUE)


# include missings as "No" values here
#design <-
#  update(design,
#        xbpchek53 = ifelse(bpchek53 ==1,'yes','no or missing'),
  #       xcholck53 = ifelse(cholck53 ==1, 'yes','no or missing')
  #)

# get the estimate for "totexp13" for the subset that includes Asians with diabetes diagnosed
svymean(~ totexp13, subset(design, racethx==4 & diabdx==1))

Pradip K. Muhuri,  AHRQ/CFACT
5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564
#
hi pradip, with meps you should be able to match precisely between r+survey
and those other languages[1]

if i had to guess, i would say that your sas and stata code is actually
doing the equivalent of this, which is not correct.  check the journal
article table #1 for syntax comparisons

    design <- svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f,
data=subset(x, racethx==4 & diabdx==1), nest=TRUE)
    svymean(~ totexp13, design)
    #Error in onestrat(x[index, , drop = FALSE], clusters[index],
nPSU[index][1],  :
    #  Stratum (1004) has only one PSU at stage 1


more detailed discussion of lonely psu behavior at [2] including how to
override this error -- but i think it comes from faulty design
specification so should not be necessary?  thanks


[1] https://journal.r-project.org/archive/2009-2/RJournal_2009-2_Damico.pdf
[2] http://faculty.washington.edu/tlumley/old-survey/exmample-lonely.html





On Wed, Jun 22, 2016 at 9:32 PM, Muhuri, Pradip (AHRQ/CFACT) <
Pradip.Muhuri at ahrq.hhs.gov> wrote: