Calculating median age for a group of US census blocks?
Dr. Snow, thanks so much for your response to my question. I think I'm going to stick with the lower- and upper-bounds method I described, even though it gives a wider range for the median age than other methods. I read the vignette for 'survival' as well at the chapters on survival from MASS and another book I have, and couldn't make heads or tails of it, much less how to apply it to this question. In the unlikely event of someone asking me to explain or defend my conclusions on median age for my neighborhood population, I would be lost about survival statistics, but could manage, with numerous hand- waves, to explain my method. I'm an old, retired guy who thinks statistics are fun, not someone with any kind of professional training or credentials. Thank you, again, for your thoughtful and thorough response. I appreciate your help. -Kevin
On Tue, 2023-09-05 at 11:31 -0600, Greg Snow wrote:
Kevin, Your idea of substituting the minimum and maximum values of the ranges will work for computing bounds on the median age, and for the median age you should not need to drop the 85+ group (unless more that 50% of people are in that group).? The mean is another issue. Another approach that may give you a smaller interval and more statistically justified range would be to turn to survival analysis techniques and treat the values from the table as interval censored data.? If the data appears to come from a known distribution then you can use parametric survival techniques to fit the distribution (see the `survreg` function in the `survival` package).? Or, there are packages that fit non-parametric models to interval censored data (`Icens` and `interval` for example) that can then be used to estimate a confidence interval on the median age (and possibly the mean age, but with limitations).? For the 85+ group you can treat them as right censored, or interval censored from 85 to infinity, or interval censored from 85 to some value like 100 or 120 (there is a small chance that someone in the table could be over 100, but rare, I think the current oldest reported living person is in the hundred and teens, so 120 would be safe). On Thu, Aug 31, 2023 at 1:48?PM Kevin Zembower via R-sig-Geo <r-sig-geo at r-project.org> wrote:
Sorry to resurrect a long-dead thread, but I'm still struggling with my desire to assign a median age to the population in a group of US census blocks. I'm using the data from the US Census table P12, which bins the ages into ranges. I'm convinced (thank you!) that I can't compute the exact median age. Can I compute the lower and upper bounds of the median age? Can I assign all the people in a binned age range (say "20 to 29 years") to the lower limit of the range, then compute the median of those ages, and say that the true median age is between this lower limit and the upper one, computed similarly? If this is valid, how do I deal with the "85 years and older" bin? I have 9 people 85 years and older, out of a total population of 537 people in my group of census blocks. For the lower bounds of the median, I assign all 9 the age of 85. What can I do for the upper bounds? I've done this, and found that the true median age is between 40 and 44 years old, if I drop all the "85 years and older" population as NA. The true mean is between 39.96 and 43.46, similarly. One thought: If there are 9 people in the "85 years and older" group, should I drop them and also drop the 9 youngest ages? I look forward to reading your thoughts. Thank you for any advice and guidance. -Kevin On Tue, 2023-08-08 at 12:00 +0200, r-sig-geo-request at r-project.org wrote:
Message: 2 Date: Mon, 7 Aug 2023 18:33:41 +0000 From: Kevin Zembower <kevin at zembower.org> To: "r-sig-geo at r-project.org" <r-sig-geo at r-project.org> Subject: [R-sig-Geo] Calculating median age for a group of US census ??????? blocks? Message-ID: ??????? <01000189d146bd0d-ecb41aac-0501-46f4-b313-a1faebeff2a9- 000000 at email.amazonses.com> Content-Type: text/plain; charset="utf-8" Hello, all, I'd like to obtain the median age for a population in a specific group of US Decennial census blocks. Here's an example of the problem: ## Example of calculating median age of population in census blocks. library(tidyverse) library(tidycensus) counts <- get_decennial( ???? geography = "block", ???? state = "MD", ???? county = "Baltimore city", ???? table = "P1", ???? year = 2020, ???? sumfile = "dhc") %>% ???? mutate(NAME = NULL) %>% ???? filter(substr(GEOID, 6, 11) == "271101" & ??????????? substr(GEOID, 12, 15) %in% c(3000, 3001, 3002) ??????????? ) ages <- get_decennial( ???? geography = "block", ???? state = "MD", ???? county = "Baltimore city", ???? table = "P13", ???? year = 2020, ???? sumfile = "dhc") %>% ???? mutate(NAME = NULL) %>% ???? filter(substr(GEOID, 6, 11) == "271101" & ??????????? substr(GEOID, 12, 15) %in% c(3000, 3001, 3002) ??????????? ) I have two questions: 1. Is it mathematically valid to multiply the population of a block by the median age of that block (in other words, assign the median age to each member of a block), then calculate the median of those numbers for a group of blocks? 2. Is raw data on the ages of individuals available anywhere else in the census data? I can find tables such as P12, that breaks down the population by age ranges or bins, but can't find specific data of counts per age in years. Thanks for your advice and help. -Kevin ------------------------------ Message: 3 Date: Mon, 7 Aug 2023 14:38:16 -0400 From: Josiah Parry <josiah.parry at gmail.com> To: Kevin Zembower <kevin at zembower.org> Cc: "r-sig-geo at r-project.org" <r-sig-geo at r-project.org> Subject: Re: [R-sig-Geo]? Calculating median age for a group of US ??????? census blocks? Message-ID: ??????? < CAL3ufUJVvcZvdtYM2V0tmo9U-RMZ1zOGL8NZDhjK7V8GFc77HA at mail.gmail.com
Content-Type: text/plain; charset="utf-8" Hey Kevin, I don't think you're going to be able to get individual level data from the US Census Bureau. The closest you may be able to get is the current population survey (CPS) which I believe is also available via tidycensus. Regarding your first question, I'm not sure I follow what your objective is with it. I would use a geography of census block groups as the measure of median for census block groups. Otherwise it is unclear how you are defining what a "group of blocks" is. ------------------------------ Message: 4 Date: Mon, 7 Aug 2023 19:00:38 +0000 From: Kevin Zembower <kevin at zembower.org> To: Josiah Parry <josiah.parry at gmail.com> Cc: "r-sig-geo at r-project.org" <r-sig-geo at r-project.org> Subject: Re: [R-sig-Geo]? Calculating median age for a group of US ??????? census blocks? Message-ID: ??????? <01000189d15f6aa3-d32ffe39-a210-436f-9f8f-cc551370f034- 000000 at email.amazonses.com> Content-Type: text/plain; charset="utf-8" Josiah, thanks for your reply. Regarding my objective, I'm trying to compile census statistics for the blocks that make up the neighborhood where I live. It consists of ten census blocks, of which I selected three for simplicity in my example. The census block-group which contains these ten blocks also contains some blocks which are outside of my neighborhood and shouldn't be counted or included. Since I won't be able to calculate the median age from the age and count data, and since the individual data doesn't seem to be available, is it your thought that I can't produce a valid median age for a group of census blocks? Thanks so much for your advice. -Kevin ------------------------------ Message: 5 Date: Mon, 7 Aug 2023 18:45:48 +0000 From: Sean Trende <strende at realclearpolitics.com> To: Josiah Parry <josiah.parry at gmail.com>, Kevin Zembower ??????? <kevin at zembower.org> Cc: "r-sig-geo at r-project.org" <r-sig-geo at r-project.org> Subject: Re: [R-sig-Geo]? Calculating median age for a group of US ??????? census blocks? Message-ID: ??????? < BLAPR20MB39382F6CD501D6B1ED8F2C11BE0CA at BLAPR20MB3938.namprd20.prod.ou tlook.com> Content-Type: text/plain; charset="utf-8" This is correct on the second question, at least for more recent censuses.? On the first question, imagine a block where the ages of three individuals are 60, 50, and 40, and another one where the ages are 20, 20, and 20.? Using your approach you would have 50 * 3 = 150 for the first block, and 20*3 = 60 for the second block.? The median of 60 and 150 is 105.? Even dividing that by three you get 35, which is not the correct median age (30). ------------------------------ Message: 6 Date: Mon, 7 Aug 2023 18:52:33 +0000 From: Kevin Zembower <kevin at zembower.org> To: Sean Trende <strende at realclearpolitics.com>,? Josiah Parry ??????? <josiah.parry at gmail.com> Cc: "r-sig-geo at r-project.org" <r-sig-geo at r-project.org> Subject: Re: [R-sig-Geo]? Calculating median age for a group of US ??????? census blocks? Message-ID: ??????? <01000189d1580211-8b8fa766-f820-4ae9-862b-e98e1a4881bf- 000000 at email.amazonses.com> Content-Type: text/plain; charset="utf-8" Yes, I see what you mean: ?> median(c(60, 50, 40, 20, 20, 20)) [1] 30 ?> median(c(50, 50, 50, 20, 20, 20)) [1] 35 ?> Thanks so much for that clear example. -Kevin ------------------------------ Message: 7 Date: Mon, 7 Aug 2023 18:53:05 +0000 From: Jeff Boggs <jboggs at brocku.ca> To: "r-sig-geo at r-project.org" <r-sig-geo at r-project.org>, Kevin ??????? Zembower <kevin at zembower.org> Subject: Re: [R-sig-Geo]? Calculating median age for a group of US ??????? census blocks? Message-ID: ??????? < YT3PR01MB91703A158414A8F28FB4052FC00CA at YT3PR01MB9170.CANPRD01.PROD.OU TLOOK.COM> Content-Type: text/plain; charset="us-ascii" Responses to your questions: Q1: No. It is not mathematically valid, sadly. Q2: I do not know, but your intuition that this is a possible solution is correct. I don't use US Census data anymore, but suspect that the data exists. Whether they are publicly-available is a different question. I suspect, though, that block level age-sex cohort in five-year intervals is available, given this is the usual ingredient for a population pyramid. That data could be used to calculate a less exact median, if you make some simplifying assumptions. Best regards, Jeff ------------------------------ Message: 8 Date: Mon, 7 Aug 2023 15:43:50 -0400 From: Dexter Locke <dexter.locke at gmail.com> To: Kevin Zembower <kevin at zembower.org> Cc: Josiah Parry <josiah.parry at gmail.com>,? "r-sig-geo at r-project.org" ??????? <r-sig-geo at r-project.org> Subject: Re: [R-sig-Geo]? Calculating median age for a group of US ??????? census blocks? Message-ID: ??????? < CAA=SVwHn=92B-k1tBZm2ioEW79gJx_QX0VD-x2UUEQOBQ+TEvg at mail.gmail.com
Content-Type: text/plain; charset="utf-8" Hi Kevin and all, Given the binned data, you could count the number of people per age class for those 10 blocks. You can then express that in a number of different ways, like percent under 25 years old, or by calculating the dependency ratio < https://www.who.int/data/gho/indicator-metadata-registry/imr-details/1 119#:~:text=Definition%3A,a%20specific%20point%20in%20time.> . I do think it is feasible to calculate an estimated mean from the counts within groups representing ranges. See, for example, here: https://stackoverflow.com/questions/18887382/how-to-calculate-the-median-on-grouped-dataset Since you are working in Baltimore, you may consider looking at The Baltimore Neighborhood Indicators Alliance https://bniajfi.org/vital_signs/. They provide useful data on a range of issues (transportation, crime, education, environment etc.) including summaries from Census- derived demographics. What you are seeking may already exist. BNIA creates neighborhoods or "community statistical areas" (n=55) based on aggregates of Census data. Although not pertaining to age, Baltimore City Planning has paid Census in the past to aggregate from individual-level Census data to the more colloquially-used definitions of Baltimore shown here (n = 273): https://data.baltimorecity.gov/datasets/neighborhood-1/explore?location=39.284832%2C-76.620516%2C12.91 Best, Dexter https://dexterlocke.com/
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo