Methodology

Data Source: American Community Survey

All district statistics come from the American Community Survey (ACS). The ACS is an extremely large and high-quality nationwide survey conducted by the U.S. Census Bureau. In 2024, the response rate exceeded 80% and the total sample included nearly 2 million households.

ACS 5-Year Estimates

We use the ACS’s 2024 5-Year Estimates which draws from surveys conducted between 2020–2024. The 2024 5-year is the latest 5-year data available—2025 5-year estimates should be available in early 2027. Dollar amounts are adjusted for inflation to reflect mid-2024.

It would be preferable to use more recent data rather than aggregating data across five years of survey responses that are up to six years old. However, 5-year estimates are necessary to have a large enough sample to compute statistics at the local district level.1 On the city homepages, the four summary cards use the ACS 1-year estimates instead, since citywide sample sizes are large enough to support the more current release.

Geographic Aggregation

Aggregating Across Census Tracts

Local district statistics are not pre-computed by the Census Bureau. To calculate them, we take a weighted average of smaller geographic units within a district called census tracts.

As a simple example, consider calculating the percentage of renter-occupied households in a district with four census tracts, each of which is fully within the district boundary:

To find the district-level renter percentage, we take the weighted average of the tracts’ renter share, weighting each tract by the total number of households in the tract:

District renter %  =  (popA × renter%A) + (popB × renter%B) + (popC × renter%C) + (popD × renter%D) popA + popB + popC + popD

 =  (1,500 × 40%) + (1,000 × 60%) + (2,000 × 30%) + (1,500 × 80%) 1,500 + 1,000 + 2,000 + 1,500  =  50.0%

Dealing with Imperfect Overlap Between Tracts and Districts

In practice, census tract boundaries do not align perfectly with district boundaries — some tracts include residents from multiple districts. For split tracts, we use the data for the full tract, including residents living both inside and outside the district boundary. We account for the fact that some residents live outside the district by weighting the tract by the fraction of its population falling within the district boundaries, rather than using the tract’s total population.

For example, the district boundary below fully encompasses Tract A, but is only partially inside Tract B. When calculating the district’s rental share, we only know that 35% of residents of Tract A are renters and 55% of residents of Tract B are renters—we don’t know the specific fraction of renters in the portion of Tract B inside the district. However, we do know how much of the population of Tract B is inside vs. outside the district. We account for this by weighting tract B’s renter share by population of tract B inside the district (1,400 households), rather than its full population (2,400 households).

District renter %  =  (popA × renter%A) + (popB, in district × renter%B) popA + popB, in district

 =  (1,800 × 35%) + (1,400 × 55%) 1,800 + 1,400  =  43.8%

We calculate the share of each split census tract’s population within the district using block-level data from the 2020 decennial Census. Census blocks are very small and almost always fall entirely within district boundaries.2 The figure below shows how block-level population data can be used to weigh split tracts, using the example from Tract B above.

Importantly, only a handful of high-level variables (like population) are available at the block level. While we have total population data at the block level from the 2020 decennial census, renter share (and most other variables appearing on the ACS) is only available at higher levels of aggregation, which necessitates using the tract-wide renter percentage rather than the renter percentage of the part of the tract within the district boundary.

Sources of Error and Imprecision

Sampling Error

The ACS surveys a sample of households rather than every household in the country. Because the ACS captures only a slice of the population, any statistic it produces is an estimate — and like all estimates, it comes with some uncertainty. If the ACS were reconducted with a new sample of households, it would yield slightly different results. This sample-to-sample variation is called sampling error, and it exists for all surveys.

A survey’s margin of error quantifies the uncertainty from sampling error. It defines a range around the published estimate within which the true population value is likely to fall. For example, if a district has an estimated 42% renter-occupied households with a margin of error of ±4 percentage points, the true figure is likely somewhere between 38% and 46%.3 ACS margins of error are reported at the 90% confidence level, meaning that if the survey were repeated many times, the true value would fall within that range about 90% of the time. Each district page has a button where you can toggle on showing the margin of error for each estimate. Technical details on how we compute district-level margins of error from ACS tract and block group data are described on this page.

Geographic Aggregation Error

When a tract overlaps a district boundary, our calculations necessarily incorporate data from people living outside the district. This introduces geographic aggregation error — a source of imprecision distinct from sampling error. Unlike sampling error, geographic aggregation error is not readily quantifiable.4 The degree to which a district is mismeasured by incorporating non-district residents will depend on how different non-district residents are from residents, which is unknown.5

While we cannot directly test the geographic aggregation error in the ACS estimates, we can test geographic aggregation error in the block level decennial Census data. Since Census blocks are very small, we can use them to get district-level estimates without geographic aggregation error. Using the same data, we can compute district level-level estimates with geographic aggregation error using the same methadology we use on the ACS data.6 Comparing district estimates from block-level data aligned perfectly with district boundaries to the same data purposefully computed with geographic aggregation error allows us to quantify how much imperfect geographic overlap between census and district boundaries skew estimates.

For each district, we show the geomgrpahic mismatch between census boundareis and district boundaries, as well validation tests using decennial Census data with and without geographic aggregation error.Importantly, decennial Census data is only available for a small number of measures, including race, age, owner vs. renter occupied households, household types.7 However, geographic aggregation error in measures the census includes may be different than the error in other ACS measures. For instance, it is possible that an area inside the district vs. outside the district in a split tract has a similar proportion of households renting (a measure in both ACS and decennial Census data), but rent prices inside the district differ from those outside the district within the tract (a measure only in the ACS). The geographic aggregation error we can measure in the Census data is not wholly indicative of the actual level of error in the ACS data.

Self-Report Errors

All ACS data come from self-reported responses, which are subject to error. The degree to which this matters varies by question. For instance, most people can accurately report their age, but when asked the year their building was constructed, respondents may give their best estimate rather than report precisely. More broadly, the accuracy of survey data depends on the reliability of self-reported responses, and for questions like building age, answers should be understood as rough approximations rather than precise measurements. We try to account for this in how we present the data. For instance, our housing age variable uses wide bins that convey the rough age distribution of the housing stock — relatively new, middle-aged, and older — rather than precise building dates.

Median Calculations Are Approximations

When calculating medians (e.g. Median Rent, Median Household Income etc.), we average tract-level medians within a district rather than computing the median directly across the full district, which is not possible from published ACS data. Averaging tract medians is an approximation of the true district-wide median. To calculate the true median, we would need to have a census unit that directly lined up with district boundaries or the underlying microdata with address-level geographic identifiers. This simplification will increasing the incluence of extreme values. For instance, if there is one tract within a district that has a very high median income, averaging across tracts means that high-income tract will have more influence than if the median was computed directly. As a result, the reported median income would be higher than than the district’s true median income.

Footnotes

  1. The Census does not release precise geographic information from the American Community Survey to protect the privacy of respondents. To calculate district-level statistics, we have to aggregate up across the geographies that the Census does release. The smallest geographic units typically include only 1,200-8,000 people, and there are not enough survey respondents within those units in a single year. The 5-year estimates combine five years of survey responses to provide more stable and reliable estimates for smaller geographies.↩︎

  2. In the rare cases where a block straddles a district boundary, we weight the block’s population by the share of the block’s area falling within the district.↩︎

  3. Not all values within this range are equally plausible. The published estimate is the single most likely value; plausibility decreases gradually toward the edges of the range, following a bell-curve distribution.↩︎

  4. Arguably, sampling error also has some unquantifiable bias. Traditional margins of error are calculated based on an assumption of a random sample. In practice, not everyone sent the ACS answers the survey, and people who answer will be different from people who do not answer. This breaks the logic of random sampling that margins of error are based on and makes the degree of total sampling error also fundamentally unknowable. This helps explain how political polls are wrong by margins well exceeding their published margin of error. However, it should be noted that the ACS is an extremely high quality survey with high response rates as well as methods that correct for sampling bias.↩︎

  5. Population shifts within split tracts since the 2020 Census (which is the data source we use to weight split tracts by the fraction of the population inside the district) will also be a source of error.↩︎

  6. Specifically, we aggregate data from Census blocks up to the tract and block-group level, then use that data the same way we use ACS data-weighting units that split district boundaries by the block-level population.↩︎

  7. The ACS is far more comprehensive and surveys people more frequently than once every 10 years, which is why we use it despite the presence of geographic aggregation error.↩︎