Skip to content

Calculating median age for a group of US census blocks?

8 messages · Kevin Zembower, Josiah Parry, Sean Trende +2 more

#
Hello, all,

I'd like to obtain the median age for a population in a specific group 
of US Decennial census blocks. Here's an example of the problem:

## Example of calculating median age of population in census blocks.
library(tidyverse)
library(tidycensus)

counts <- get_decennial(
     geography = "block",
     state = "MD",
     county = "Baltimore city",
     table = "P1",
     year = 2020,
     sumfile = "dhc") %>%
     mutate(NAME = NULL) %>%
     filter(substr(GEOID, 6, 11) == "271101" &
            substr(GEOID, 12, 15) %in% c(3000, 3001, 3002)
            )

ages <- get_decennial(
     geography = "block",
     state = "MD",
     county = "Baltimore city",
     table = "P13",
     year = 2020,
     sumfile = "dhc") %>%
     mutate(NAME = NULL) %>%
     filter(substr(GEOID, 6, 11) == "271101" &
            substr(GEOID, 12, 15) %in% c(3000, 3001, 3002)
            )

I have two questions:

1. Is it mathematically valid to multiply the population of a block by 
the median age of that block (in other words, assign the median age to 
each member of a block), then calculate the median of those numbers for 
a group of blocks?

2. Is raw data on the ages of individuals available anywhere else in the 
census data? I can find tables such as P12, that breaks down the 
population by age ranges or bins, but can't find specific data of counts 
per age in years.

Thanks for your advice and help.

-Kevin
#
Hey Kevin, I don't think you're going to be able to get individual level
data from the US Census Bureau. The closest you may be able to get is the
current population survey (CPS) which I believe is also available via
tidycensus. Regarding your first question, I'm not sure I follow what your
objective is with it. I would use a geography of census block groups as the
measure of median for census block groups. Otherwise it is unclear how you
are defining what a "group of blocks" is.

On Mon, Aug 7, 2023 at 2:34?PM Kevin Zembower via R-sig-Geo <
r-sig-geo at r-project.org> wrote:

            

  
  
#
This is correct on the second question, at least for more recent censuses.  On the first question, imagine a block where the ages of three individuals are 60, 50, and 40, and another one where the ages are 20, 20, and 20.  Using your approach you would have 50 * 3 = 150 for the first block, and 20*3 = 60 for the second block.  The median of 60 and 150 is 105.  Even dividing that by three you get 35, which is not the correct median age (30).

-----Original Message-----
From: R-sig-Geo <r-sig-geo-bounces at r-project.org> On Behalf Of Josiah Parry
Sent: Monday, August 7, 2023 2:38 PM
To: Kevin Zembower <kevin at zembower.org>
Cc: r-sig-geo at r-project.org
Subject: Re: [R-sig-Geo] Calculating median age for a group of US census blocks?

Hey Kevin, I don't think you're going to be able to get individual level data from the US Census Bureau. The closest you may be able to get is the current population survey (CPS) which I believe is also available via tidycensus. Regarding your first question, I'm not sure I follow what your objective is with it. I would use a geography of census block groups as the measure of median for census block groups. Otherwise it is unclear how you are defining what a "group of blocks" is.
On Mon, Aug 7, 2023 at 2:34?PM Kevin Zembower via R-sig-Geo < r-sig-geo at r-project.org> wrote:

            
_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
#
Yes, I see what you mean:

 > median(c(60, 50, 40, 20, 20, 20))
[1] 30
 > median(c(50, 50, 50, 20, 20, 20))
[1] 35
 >

Thanks so much for that clear example.

-Kevin
On 8/7/23 14:45, Sean Trende wrote:
#
Responses to your questions:
Q1: No. It is not mathematically valid, sadly.

Q2: I do not know, but your intuition that this is a possible solution is correct.

I don't use US Census data anymore, but suspect that the data exists. Whether they are publicly-available is a different question. I suspect, though, that block level age-sex cohort in five-year intervals is available, given this is the usual ingredient for a population pyramid. That data could be used to calculate a less exact median, if you make some simplifying assumptions.

Best regards,
Jeff
#
Josiah, thanks for your reply.

Regarding my objective, I'm trying to compile census statistics for the 
blocks that make up the neighborhood where I live. It consists of ten 
census blocks, of which I selected three for simplicity in my example. 
The census block-group which contains these ten blocks also contains 
some blocks which are outside of my neighborhood and shouldn't be 
counted or included.

Since I won't be able to calculate the median age from the age and count 
data, and since the individual data doesn't seem to be available, is it 
your thought that I can't produce a valid median age for a group of 
census blocks?

Thanks so much for your advice.

-Kevin
On 8/7/23 14:38, Josiah Parry wrote:
#
Hi Kevin and all,

Given the binned data, you could count the number of people per age class
for those 10 blocks. You can then express that in a number of
different ways, like percent under 25 years old, or by calculating the
dependency
ratio
<https://www.who.int/data/gho/indicator-metadata-registry/imr-details/1119#:~:text=Definition%3A,a%20specific%20point%20in%20time.>
.

I do think it is feasible to calculate an estimated mean from the counts
within groups representing ranges. See, for example, here:
https://stackoverflow.com/questions/18887382/how-to-calculate-the-median-on-grouped-dataset

Since you are working in Baltimore, you may consider looking at The
Baltimore Neighborhood Indicators Alliance https://bniajfi.org/vital_signs/.
They provide useful data on a range of issues (transportation, crime,
education, environment etc.) including summaries from Census-derived
demographics. What you are seeking may already exist. BNIA creates
neighborhoods or "community statistical areas" (n=55) based on aggregates
of Census data.

Although not pertaining to age, Baltimore City Planning has paid Census in
the past to aggregate from individual-level Census data to the more
colloquially-used definitions of Baltimore shown here (n = 273):
https://data.baltimorecity.gov/datasets/neighborhood-1/explore?location=39.284832%2C-76.620516%2C12.91

Best, Dexter
https://dexterlocke.com/





On Mon, Aug 7, 2023 at 3:02?PM Kevin Zembower via R-sig-Geo <
r-sig-geo at r-project.org> wrote:

            

  
  
#
Dexter, Thanks so much for your reply. I wasn't aware of the two sources 
you cite, and I'll be sure to include them in my work.

The open Baltimore website, at 
https://data.baltimorecity.gov/datasets/neighborhood-1/explore, has 
statistics for my neighborhood, Radnor-Winston 
(https://radnorwinston.org).The Baltimore Neighborhood Indicators 
Alliance at https://bniajfi.org/vital_signs/ lumps us into North 
Baltimore/Guilford/Homeland, which, as I'm sure you're aware, contains 
many homes (mansions!) with very different characteristics than the 
people of Radnor-Winston.

Thanks, again, for your help and expertise. I learned a lot from your note.

-Kevin
On 8/7/23 15:43, Dexter Locke wrote: