Possibly Inaccurate Age and Sex Data in the CPS ASEC PUMS Files
In April 2009, the U.S. Census Bureau acknowledged problems(see below for summary of user note) in the techniques it uses to prevent the identification of specific individuals in public-use microdata sample (PUMS) files. These techniques produced inconsistent sex ratios for people ages 65 and older in the PUMSs produced from the 2000 Census, the 2003-2006 American Community Survey, and the 2005-2006 Puerto Rican Community Survey. The 2003-2010 Current Population Survey ASEC PUMS files were also affected.
In December 2009, the Census Bureau released corrected age data for the 2006 ACS/PRCS. However, it will not release corrected CPS data because the error affects income and poverty estimates only slightly. For other variables, ASEC PUMS files for 2003-2010 may not be representative of the 65+ population at individual ages, and analyses of variables that are expected to change by age stand to be particularly affected.
For a full discussion of the problem and its implications for researchers, see:
Alexander, J. Trent, Michael Davern, and Betsey Stevenson. 2010. "The Polls-Review: Inaccurate Age and Sex Data in the Census PUMS Files: Evidence and Implications." National Bureau of Economic Research Working Paper No. 15703.The U.S. Census Bureau User Note regarding this possible inaccuracy in data is not longer available. Below is a summary of it's contents:
The Bureau of the Census implemented an age perturbation procedure in the Current Population Survey (CPS) Public Use Files in August 2002 to enhance confidentiality. This procedure involved adjusting the ages of selected household members to protect privacy. However, it resulted in inconsistent male/female sex ratios, especially for individuals aged 65 and over in the CPS files from 2003 to 2010.
The age perturbation affected both the internal files used for Census Bureau reports and tables and the public use files. A study described the issue in more detail in a January 2010 National Bureau of Economic Research Working Paper. The Census Bureau is currently exploring alternative age perturbation procedures and expects to implement a new procedure in January 2011.
A careful review of the Annual Social and Economic Supplement (ASEC) data for the most affected age groups showed some significant differences between perturbed and unperturbed estimates. However, these differences were primarily due to the high correlation between the estimates, resulting in small confidence intervals. Most differences in income and poverty estimates fell within a 90-percent confidence interval. In light of the relatively small differences found in the ASEC data, the decision was made not to re-release the 147 CPS files already in the public domain using the new method.
ASEC files from 2003 onwards, containing both the actual and masked ages, are available in the Census Research Data Centers (RDCs) for users who require unmasked age data for specific analyses. The Census Bureau will ensure reasonable access to these files for appropriate analysis and work directly with users who cannot use the RDCs for their research.
Regarding income, a review of median household income and mean earnings by age, race, and Hispanic origin in 2008 revealed three statistically significant differences between perturbed and unperturbed estimates. All significant differences occurred for mean earnings of men, where the perturbed estimates were higher than the unperturbed estimates. The differences were observed in the age groups of 65 to 69 (all races), 65 to 69 (Hispanic origin), and 75 and older (Asian).
For poverty rates by race and Hispanic origin in 2008, six statistically significant differences were found between perturbed and unperturbed estimates. All significant differences occurred in the two narrowest age categories: 65 to 69 and 70 to 74. In three cases, the perturbed poverty rate was lower than the unperturbed poverty rate, while in three other cases, the perturbed poverty rate was higher. The specific age, race, and gender combinations where the differences occurred are detailed in the original document.