Get Ready!

  • Install or update R to 3.6.1, and optionally update RStudio. Instructions are here.

  • Install packages tidyverse, tidycensus, tigris, sf, and censusapi: install.packages(c("tidyverse", " tidycensus", " tigris", "sf", "censusapi"))

  • Download this zipped folder. (Look for the download button.) It contains one R script (tidycensus.R) and one PDF handout (accessing_census_r_handout.pdf). Unzip it and put it somewhere you can locate it on your machine.

What does the Census Bureau provide?

The Census Bureau is one of the best sources of data about the US’s people, places, and economy. The Census provides timely information at low levels of geography. The Census provides demographic and geographic data – both of which you need to understand in order to use effectively.

Please see the Census Basics Workshop materials for details on these fundamentals.

Demographic data

The Census Bureau conducts over 100 censuses, surveys, and programs. The most popular are:

  1. The American Community Survey (ACS). The ACS provides detailed information about people (demographics) and housing. Topics like educational attainment, income, health insurance status, language, and detailed housing characteristics are all included. 3.5 million people are surveyed annually - about 1% of the population. The smallest geography available is the block group. There is a trade-off between size of geography (by population size), time interval of estimate (1-, 3-, 5-year), and standard error. Only geographies with a population of 65,000 or more will get 1-year estimates. All smaller geographies rely on 5-year estimates.

  2. Decennial Census of Population and Housing. This is the classic Census product used for population counts, which in turn affects legislative redistricting and federal funding. This counts every resident of the country on April 1 in years ending in zero. It also provides the most basic demographic characteristics: age, sex, and race.

  3. Population Estimates Program (PEP). The Census also provides population estimates using decennial Census population data combined with estimates of births, deaths, and migration. Nation, states, and counties all receive annual population estimates.

The Census provides a number of other important surveys, such as the Current Population Survey (CPS), Survey of Income and Program Participation (SIPP), Survey of Business Owners, Census of Governments, and many more.

ACS Public Use Microdata Sample (PUMS)

Note that the surveys listed above represent statistics about the population. Alternatively, the ACS Public Use Microdata Sample (PUMS) provides individual responses to the surveys. PUMS files for the 1-year ACS include 1% of the US population; the 5-year ACS includes 5%. Some modifications are made to the dataset to protect confidentiality. For example, very high incomes are top-coded or unusual birthplace or ancestries are grouped into broader categories. The most important thing to know about PUMS is that the geographic variables are limited. The lowest levels of geography include state, MSA, and PUMA. Public Use Microdata Areas (PUMAs) are special non-overlapping, contiguous units of geography that contain a minimum of 100,000 people. Sorry, you can’t get census tracts with PUMS files! The best source of PUMS data is the University of Minnesota’s IPUMS data archive: “IPUMS USA collects, preserves and harmonizes U.S. census microdata and provides easy access to this data with enhanced documentation. Data includes decennial censuses from 1790 to 2010 and American Community Surveys (ACS) from 2000 to the present.”

Geographic Data

Standard Hierachy of Census Geographic Entities

Standard Hierachy of Census Geographic Entities

Requisite image of Census geographies! I think the important thing to highlight here is that:

  • some geographies are nested (i.e., census blocks on up through the center line),
  • while some are not (i.e., core-based statistical areas aka metro areas),
  • and some are not that you might expect would be nested (i.e., ZIP Code Tabulation Areas aka ZCTAs aka the Census’s attempt at zip codes).

The other important thing to consider is which datasets provide data for which geographies. For example,

  • the Decennial Census goes down to the block level (smallest geography),
  • while the American Community Survey (ACS) only goes down to the block group level, and
  • Population Estimates only go down to the county…

You can access maps through the Geography Program.

As an example of what is available, see “Census Data API: FIPS Geographies 2018 1-year ACS data”.

Getting the Data

From now on, we will mostly focus on retrieving demographic data via tables, and focus less on geographies and mapping.

  • These web-based/point-and-click options give you the ability to download a CSV file, and import them into the analysis software of your choice. These can be helpful when searching for table IDs. The learning curve is very low.

    • American FactFinder or data.census.gov: web applications to search and navigate census data. FYI: American FactFinder is no longer being updated and will be replaced by data.census.gov.

    • Social Explorer: Social Explorer, brought to you by UVA Library, is much easier to use than the free Census products if you only need to pull a few tables.

  • APIs for developers:

Table IDs and FIPS Codes Explained

In order to effectively work with Census data, it’s best to get familiar with Table IDs and FIPS Codes. They are the keys to getting to tabular data.

ACS Table IDs

There are 5 elements within table IDs. Note that the first letter(s) refers to Table Type:

  • B: base table, most detail

  • C: Collapsed

  • DP: data profiles

  • S: Subject table

There are other, less commonly used types as well. Only the 4 types listed above will work in the tidycensus package.

One of the hard parts of working with Census data in R is figuring out which table (and table ID) you need. The Census publishes about 64,000 tables with each release of 5-year data.

The Census Reporter is very useful, especially when searching for table IDs. However, it only includes B (base) and C (collapsed) are available; it does not include the common S (subject) tables, or other less common tables.

Here is an example of a Table ID for Family Poverty: B17010.

FIPS Codes

All Census geographies get FIPS codes. Lets break down the FIPS code for this block group in Charlottesville: 515400009001 (within the Locust Grove neighborhood).

  • 51: Virginia (State)
  • 540: Charlottesville City (County)
  • 000900: Census Tract
  • 1: Block Group

You might be able to guess that the FIPS code for Charlottesville City (County) is 51540.

You can find the FIPS code for state, county, census tract, and block group using an address look-up tool at the Census Geocoder site. There is also a Geocoder API.

Here is a list of geography examples for the 2017 5-year ACS.

R packages

There are many packages that were designed specifically for Census data. “A Guide to Working with Census Data” lists 22 packages, as of 2018. Packages of greatest interest are:

  1. tigris. Download TIGER/Line shapefiles and load as ‘SpatialDataFrame’/‘sf’ objects.

  2. acs. A general toolkit for downloading, managing, analyzing, and presenting data from the U.S. Census. Includes decennial census and ACS. Standard errors can be bundled with estimates.

  3. choroplethr and choroplethrmaps. Facilitates creating of choropleth maps for state, county, and tracts. A choropleth map uses differences in shading to indicate values. Further documentation available at developer’s website.

  4. tidycensus. “An integrated R interface to the decennial US Census and American Community Survey APIs and the US Census Bureau’s geographic boundary files. Allows R users to return Census and ACS data as tidyverse-ready data frames, and optionally returns a list-column with feature geometry for many geographies.”

  5. censusapi. “A wrapper for the U.S. Census Bureau APIs that returns data frames of Census data and metadata. Available datasets include the Decennial Census, American Community Survey, Small Area Health Insurance Estimates, Small Area Income and Poverty Estimates, Population Estimates and Projections, and more.”

  6. ipumsr. “An easy way to import census, survey and geographic data provided by ‘IPUMS’ into R plus tools to help use the associated metadata to make analysis easier.” You might want to use IPUMS if you need microdata, or data harmonized across space and time. You can go to IPUMS, extract the data you need, and import the data to R with the ipumsr package. An API is currently in development.

Try it out

Let’s take a look at our R script. Along with tidyverse tools, we specifically will look at these packages:

  • tidycensus: package for retrieving tidy census data for ACS and Decennial Census, plus easy mapping
  • censusapi: package for retrieving any of over 200 Census API endpoints
  • some tigris and sf: tigris is integrated into tidycensus for mapping; sf handles the mapping

Even more packages to consider

These packages might be useful to your work with Census data:

  • Mapview: quickly and conveniently create interactive visualizations of spatial data. Good for quick (not presentation grade) mapping.

  • RankingProject: “The package provides functions for plotting ranked tables of data side-by-side with their plots. The available visualizations include shaded columns plots, adjusted confidence intervals, and related plots intended for making correct inferences about one-to-many or many-to-many comparisons.” This package should be helpful for dealing with overlapping confidence intervals. Also see the related Census Academy course.

Learn more

Census, in general:

  • Understanding and Using American Community Survey Data: What All Data Users Need to Know. “This handbook provides an overview of the ACS to help data users understand the basics of the survey, how the data can be used, how to judge the accuracy of ACS estimates, and how to access ACS data. It also includes some recent case studies that show how ACS data are being used to help address important policy and program issues. Links to additional ACS resources, including technical documentation for more advanced users, are included throughout the handbook.”

    Table of Contents:

    1. Understanding the ACS: The Basics
    2. Geographic Areas Covered in the ACS
    3. Understanding and Using ACS Single-Year and Multiyear Estimates
    4. Making Comparisons with ACS Data
    5. Accessing ACS Data
    6. Case Studies Using ACS Data
    7. Understanding Error and Determining Statistical Significance
    8. Calculating Measures of Error for Derived Estimates
    9. Differences Between the ACS and the Decennial Census
    10. Using Dollar-Denominated Data
    11. Measures of Nonsampling Error
    12. Glossary
  • American Community Survey (ACS): Code Lists, Definitions, and Accuracy. Specifically Instructions for Applying Statistical Testing: “Basic instructions for obtaining the ACS standard errors needed to do manual statistical testing.”

Census data with R:

tidycensus:

One more: