What is CPS?
The Current Population Survey is a powerful source of data for investigating social and economic trends in the U.S. over the past half century. The Current Population Survey (CPS) is administered monthly by the U.S. Bureau of the Census to over 65,000 households. These surveys gather information on education, labor force status, demographics, and other aspects of the U.S. population. The CPS is widely used by demographers, economists, sociologists, and other population-related researchers. In addition, the CPS is the basis upon which federal statistics on unemployment are calculated monthly.
Despite their importance to the research community, the CPS files from the U.S. Census Bureau are inconvenient to use, particularly for novice researchers. Problems are especially acute for those attempting to form a time series by piecing together surveys from many different years. Variables change location and length over time, requiring several different program formats to obtain a given set of variables across many years. Old variables are dropped and new ones added to files over time. Variable coding changes -- as do the questions from which the variables are derived -- and changes in questionnaire content are often subtle. For example, the values at which monetary variables are top-coded (i.e., the unbounded top range of values, for instance 50+) vary over time, often in ways not clearly spelled out in the survey documentation.
There are challenges in working with single samples as well. The Census-supplied documentation is sometimes incomplete and difficult to interpret, particularly for the early surveys. Determining the universe of respondents for questions is frequently not straightforward, requiring researchers to trace through skip patterns on questionnaires. Even the act of finding all variables on a specific topic, determining their coding, and ascertaining the context in which the appropriate questions were asked, can itself be a cumbersome process that requires a time-consuming manual search through CPS documentation.
Similar Visions, Independent Work, and an Eventual Collaboration
Both the Unicon Research Corporation and the Minnesota Population Center recognized these limitations of the CPS and simplified and streamlined its use. The organizations had similar yet distinct views about how exactly to do this. Unicon created CPS Utilities (described in the section that follows), which enabled researchers to easily access several variables and years of data, largely unchanged from the original data, via one system. With IPUMS-CPS, the Minnesota Population Center took an interventionist approach by simplifying and harmonizing original variable names and limiting the redundancy in variables offered to users.
CPS Utilities and IPUMS-CPS operated largely independently of one another until 2011 when the project staff began to collaborate to clean and document CPS data. The Unicon-MPC collaboration is based on our mutual goals of preserving CPS data and documentation and making the data easy to access at no charge to users through the IPUMS online system. Unicon ceased its production of CPS Utilities at the end of 2014. Currently, most data files and documentation have been incorporated into the IPUMS-CPS system by MPC staff. The work that Unicon began in 1989 will continue through the Minnesota Population Center.
History of Unicon's CPS Utilities
The CPS Utilities software originated in 1989 as a tool to allow in-house Unicon researchers easy and accurate access to the March Annual Demographic CPS data files. Variable concepts were analyzed over all the available years and each concept was assigned a fixed variable name (later referred to as the 'Unicon name') to be used across all years. Variable column locations and lengths were hard coded into the software. All known variable documentation was gathered in the Utilities dictionary files and appendices to relieve researchers of the need to read through each individual Census March CPS manual. The tool proved so useful and reliable that Unicon expanded the software to include other CPS data series. As of 2014, there were 457 data files spread over 15 unique series.
The initial funding and most of the ongoing funding for creating and expanding the Utilities was provided by Unicon's president and founder, Dr. Finis Welch. The product was also funded in part by Small Business Innovation Research (SBIR) grants from the National Institute on Aging, the National Library of Medicine, the National Institute of Child Health and Human Development, and the U.S. Census Bureau. The Utilities contents are solely the responsibility of the authors and do not necessarily represent the official views of the funding institutions. By accepting these SBIR grants, it became mandatory to present the product as a commercially viable commodity. For this reason, when Unicon offered the CPS Utilities to researchers outside of the company in 1994, it became necessary to set a minimum charge for the service and the product.
As Unicon expanded its range to the non-March series, an intense search for the missing files was conducted. Early files were collected from several data facilities which include the U.S. Census Bureau, the U.S. Bureau of Labor Statistics (BLS), the U.S. National Archives, the National Bureau of Economic Research (NBER), the Inter-university Consortium for Political and Social Research (ICPSR), and the Center for Demography and Ecology at University of Wisconsin, Madison (CDE). A list of the provenance of the data and the documentation for the CPS files housed in Unicon's library is provided. It should be noted that the early data were received on 9 track tapes. With the introduction of more modern I/O equipment, the 9-track drives were phased out. The collection of Census tapes was systematically destroyed once the data were copied to DVDs. More recently, the data have been downloaded directly from the online Census Ferret FTP site.
History of IPUMS-CPS
The Minnesota Population Center's effort to increase the accessibility of CPS data began in the early 2000s and built off the successful infrastructure developed to provide web-based access to decennial census data. This infrastructure, IPUMS-USA, revolutionized the research community's access to microdata. IPUMS-USA contained multiple years of decennial Census data, allowing users to study long-run change in the United States. IPUMS-CPS was initially conceived of as a natural complement to the information provided in the decennial census data; it provided the best source of data for understanding social and economic patterns between decennial censuses (King & Tertilt, 2003). As such, it was designed to be compatible with IPUMS-USA.
The first phase of IPUMS-CPS, funded by the National Science Foundation and National Institutes of Health, yielded a web-based dissemination system for the delivery of harmonized data and documentation from the March Annual Demographic and Economic Characteristics file (hereafter referred to as the ASEC). By the end of the initial funding period, the database provided researchers with ready-to-use data spanning the period from 1962-2015. Each variable delivered via the new system had a description of its contents, information about comparability over time, and at-a-glance frequencies for the majority of variables. Variables were recoded so they were consistent over time without any loss of information and universe definitions were empirically verified and documented online.
The second major phase of work, currently ongoing, involves expanding IPUMS-CPS to incorporate non-ASEC CPS monthly and supplement data. Funded by the National Institutes of Health and in collaboration with Unicon Corporation, the current iteration of IPUMS-CPS will, by completion, house the majority of all publicly available CPS basic and supplement data, allow users to easily link observations over time, and continue to provide the high level of research support that Unicon Corporation and IPUMS have always provided.