Frequently Asked Questions (FAQ)
What is IPUMS-CPS?
What's in the future for IPUMS?
Does IPUMS-CPS add value to the data?
Where should a new user start?
How do I get access to IPUMS-CPS data?
What are microdata?
What are "pointer variables"?
What are "weights"?
What does "universe" mean in the variable descriptions?
What are "data quality flags"?
How do I obtain data?
What format are the data in?
How long does a data extract take?
What if the samples are too big for me to handle?
How does "sample selection" work on the IPUMS-CPS web site?
What does "add to cart" mean?
Why can't I open the data file?
Is there a preferred statistical package for using the IPUMS?
Can I analyze IPUMS-CPS data without a statistical package?
Can I get the original data?
How is a record uniquely identified?
Using IPUMS data
Are there tricky aspects of IPUMS data to be particularly aware of?
What are the major limitations of the data?
Can I find particular individuals in the IPUMS data?
How do I cite IPUMS-CPS?
Can I use IPUMS for genealogy?
Using the variables page
Variables page menu
Variables page details
Using the data extract system
Your data cart
Why are some variables in my data cart preselected?
What is "Type"?
Extract request page
> Extract definition: Data structure
Extract option: Select cases
Extract option: Attach characteristics
Extract option: Select data quality flags
Extract option: Describe your extract
What does it mean for a variable to have "multiple variables"?
General information about the project
What is IPUMS-CPS? [top]
IPUMS-CPS is an integrated set of data spanning more than 50 years (1962-forward) of the Current Population Survey (CPS). The CPS is a monthly U.S. household survey conducted jointly by the U.S. Census Bureau and the Bureau of Labor Statistics. Initiated in the 1940s in the wake of the Great Depression, the survey was designed to measure unemployment. A battery of labor force and demographic questions, known as the "basic monthly survey," is asked every month. Over time, supplemental inquiries on special topics have been added for particular months. Among these supplemental surveys, the Annual Social and Economic Supplement (hereafter referred to as the ASEC) is the most widely used by social scientists and policymakers. The ASEC along with basic monthly data from the CPS provides the data for IPUMS-CPS.
To make cross-time comparisons using the CPS data more feasible, variables in IPUMS-CPS are coded identically or "harmonized" for 1962 forward. This harmonized dataset is also compatible with the data from the U.S. decennial censuses that are part of the Integrated Public Use Microdata Series (IPUMS-USA). Researchers can take advantage of the relatively large sample size of IPUMS-USA at ten-year intervals and fill in information for the intervening years using IPUMS-CPS.
IPUMS is not a collection of compiled statistics; it is composed of microdata. Each record is a person, with all characteristics numerically coded. In most samples persons are organized into households, making it possible to study the characteristics of people in the context of their families or other co-residents. Because the data are individuals and not tables, researchers must use a statistical package to analyze the millions of records in the database. A data extraction system enables users to select only the samples and variables they require.
IPUMS-International is the world's largest collection of publicly available individual-level census data. IPUMS-International integrates samples from population censuses from around the world taken since 1960. Scholars interested only in the United States are better served using IPUMS-USA, which is optimized for U.S. research.
IPUMS-USA consists of over sixty high-precision samples of the American population drawn from fifteen federal censuses and from the American Community Surveys. The IPUMS assigns uniform codes across all the samples and brings relevant documentation into a coherent form to facilitate analysis of social and economic change.
What's in the future for IPUMS? [top]
IPUMS-CPS is funded through 2015 by several grants from the National Institute of Child Health and Human Development. In addition to improving the ASEC, we are making the basic monthly CPS surveys from 1976 through the present available through IPUMS. We plan annual data releases every fall, and our latest plans can be found on our data release schedule page.
We have every expectation of continuing the project beyond the current funding period, but will have to secure further funding as our current grants expire. To be successful, we need to have a large body of users and published works we can point to. Please inform us if you have any presentations or publications using IPUMS data.
Does IPUMS-CPS add value to the data? [top]
IPUMS data is integrated over time and across samples by assigning uniform codes to variables. This process itself adds value to the data by fully documenting all codes and compiling all variable documentation in a hyperlinked web format. But we do many other things as well:
IPUMS creates a consistent set of constructed variables on family interrelationships for all samples. The "pointer" variables indicate the location within the household of every person's mother, father, and spouse.
IPUMS data also includes harmonized income and occupation variables. Although IPUMS retains the original occupation and industry codes, the variables OCC1950 and IND1950 have been created for long-term analysis.
Where should a new user start? [top]
The natural starting point is the "Select Data" or "Browse and Select Data" links on the left navigation bar and the top banner. These links open the variables page: the primary tool for exploring the contents of IPUMS-CPS. By default, the variables page displays one variable group at a time for all samples in the data series. You can change the view option to show all groups simultaneously, but the page can get very large and slow to load. However, you can filter the information at any point to include only the samples of interest to you ("Select samples"). Initially, the variables screen is set to display the integrated variables.
When you select samples, the page will display only variables present in those samples. An "x" indicates the availability of a variable for a particular sample.
On the variables page, clicking on a variable name brings up its documentation. The information about the variable is contained on a number of tabs. The default tab is the brief description of the variable. More information is usually available on the "comparability" tab. The variables page also has direct links to the codes page for each variable (they are also accessible as a tab in the variable description). The codes page shows the codes and labels for the variable, and the availability of categories across samples. These categories can suggest the types of research possible with a given sample.
Throughout the variable documentation system there are buttons to "Add to cart." Any variables you select in this way are put in your data cart to include in a data extract. Your selections only last for the current web session.
The Data Cart in the upper right keeps track of your variable and sample selections. Once you have made some selections you can click on "View Cart" to review your choices. If you have selected variables and samples you can enter the data extract system. To make a data extract you must be registered to use IPUMS-CPS. If you are already registered to use IPUMS-CPS, you can click on "create an extract" and use the data access system. The instructions for the extractions system are here.
In addition, the Samples page provides links to information on the sizes of the CPS samples, characteristics of the samples, and differences between the sampling methods used for the census and the CPS.
How do I get access to IPUMS-CPS data? [top]
Access to the documentation is freely available without restriction; however, users must register before extracting data from the website.
The IPUMS-CPS data is also available for analysis online through the IPUMS Online Data Analysis System.
What are microdata? [top]
Census microdata are composed of individual records containing information collected on persons and households. The unit of observation is the individual. The responses of each person to the different census questions are recorded in separate variables.
Microdata stand in contrast to more familiar "summary" or "aggregate" data. Aggregate data are compiled statistics, such as a table of marital status by sex for some locality. There are no such tabular or summary statistics in the IPUMS data.
Microdata are inherently flexible. One need not depend on published statistics from a census that compiled the data in a certain way, if at all. Users can generate their own statistics from the data in any manner desired, including individual-level multivariate analyses.
See an image of IPUMS data here. All IPUMS data are in this general format.
What are "pointer variables"? [top]
The IPUMS "pointer" variables indicate the location within the household of every person's mother, father, and spouse. Nearly all samples indicate the relationship of each person to the head of household, but it is much harder to relate individuals to persons other than the head (for example, grandchildren to children, sons-in-laws to daughters, or unrelated persons to each other). We have developed a complex core algorithm to make such connections, and we customize it as needed to account for peculiarities of specific samples. The pointer variables are called MOMLOC, POPLOC and SPLOC in the IPUMS system, and accompanying variables indicate the major rules under which a specific link was made.
The pointer variables make it easy to construct individual-level variables representing the characteristics of co-resident persons, such as occupation of spouse, age of mother, or educational attainment of father. You need to include the serial and person ID variables (SERIAL and PERNUM) in your extract, as well as the pointer variables themselves, to perform these data manipulations.
What are "weights"? [top]
The IPUMS-CPS samples are weighted, with some records representing more cases than others. This means that persons and households with some characteristics are over-represented in the samples, while others are underrepresented.
To obtain representative statistics from the samples, users must apply sample weights. Follow one of the following procedures:
1. For most person-level analyses of the ASEC samples, apply the ASECWT variable. ASECWT gives the population represented by each individual in the sample.
2. For person-level analyses of non-supplement IPUMS-CPS data, apply the WTFINL variable. WTFINL gives the population represented by each individual in the sample.
3. For person-level analyses analyzing the summary health insurance variables (HCOVANY, HCOVPRIV, HINSEMP, HINSPUR, HCOVPUB, HINSCAID, HINSCARE, and HINSMIL), apply the HINSWT variable rather than ASECWT.
5. For household-level analyses of the ASEC, weight the households using the ASECWTH variable. ASECWTH gives the number of households in the general population represented by each household in the sample.
6. For household-level analyses of non-supplement data, weight the households using the HWTFINL variable. HWTFINL gives the number of households in the general population represented by each household in the sample.
7. For analyses of variables from topical supplements, use the approrpiate supplement-specific weight. For a list of supplement-specific weights, see the sample weights page.
What does "universe" mean in the variable descriptions? [top]
The universe is the population at risk of having a response for the variable in question. In most cases these are the households or persons to whom the census question was asked, as reflected on the census questionnaire. For example, children are not usually asked employment questions, and men and children are not asked fertility questions. Cases that are outside of the universe for a variable are labeled "NIU" on the codes page. Differences in a variable's universe across samples are a common data comparability issue.
The universes will not always be entirely clean of apparently erroneous cases. Some persons or households that should not have answered the question did, and some that should have answered may be included in the "NIU" (not in universe) category. But until we perform comprehensive data editing and allocation in the future, we do not know whether the variable in question is in error or whether the variables that define the universe (for example, age or employment status) are incorrect.
What are "data quality flags"? [top]
Many variables in the Current Population Survey have been edited for missing, illegible and inconsistent values. Data quality flags indicate which values are edited or allocated. More information on the procedures used for editing and allocating the data can be found in the Technical Papers for the Current Population Survey.
Each data quality flag corresponds to one or more variables, and the codes for each flag vary based on the sample. Data quality flags, and corresponding codes pages, can be viewed by clicking here for person flags and here for household flags. Data quality flags can be included in your extract as an option before you submit your extract request by clicking "Select data quality flags."
How do I obtain data? [top]
All IPUMS data are delivered through our data extraction system. Users select the variables and samples they are interested in, and the system creates a custom-made extract containing only this information. To start, users can reference our instructions for the data extraction system and instructions for opening an IPUMS extract on your computer.
Data are generated on our server. The system sends out an email message to the user when the extract is completed. The user must download the extract and analyze it on their local machine. Access to the documentation is freely available without restriction; however, users must register before extracting data from the website.
What format are the data in? [top]
IPUMS produces fixed-column ASCII data. Data are entirely numeric. By default, the extraction system rectangularizes the data: that is, it puts household information on the person records and does not retain the households as separate records. No information is lost, and this is the format preferred by most researchers; however, the extraction system includes the option of hierarchical data or household record only data.
In addition to the ASCII data file, the system creates a statistical package syntax file to accompany each extract. The syntax file is designed to read in the ASCII data while applying appropriate variable and value labels. SPSS, SAS, and Stata are supported. You must download the syntax file with the extract or you will be unable to read the data. The syntax file requires minor editing to identify the location of the data file on your local computer.
A codebook file is also created with each extract. It records the characteristics of your extract and should be downloaded for record-keeping.
All data files are created in gzip compressed format. You must uncompress the file to analyze it. Most data compression utilities will handle the files.
How long does a data extract take? [top]
The time needed to make an extract differs depending on the number and size of samples requested, whether case selection is performed, and the load on our server. Extracts can take from a few minutes to an hour or more. The system sends an email when the extract is completed, so there is no need to stay active on the IPUMS site while the extract is being made.
What if the samples are too big for me to handle? [top]
It is possible to make samples that are extremely large. There are two ways to reduce file size. You can select fewer samples or variables; or you can use the case selection feature of the extract system to include only records with certain characteristics, such as females age 15 to 49. Simply selecting out cases is not always desirable, however, because you may want all the co-resident persons as well. Accordingly, the case selection function also lets you choose to include everyone living in a household with a person with the selected characteristics.
How does "sample selection" work on the IPUMS-CPS web site? [top]
When a user first enters the variable documentation system, all samples are selected by default. Every variable in the system will display on all relevant screens.
Users can filter the information displayed by selecting only the samples of interest to them. Only the variables available in one of the selected samples will appear in the variable lists. The integrated variable descriptions and codes pages will also be filtered to display only the text and columns corresponding to the selected samples. Sample selections can be altered at any time in your session. Selections do not persist beyond the current session.
When a user enters the extract system after selecting samples, those selections are carried into the data extract system.
What does "add to cart" mean? [top]
While browsing variables in the documentation system, you can place them into your data cart. Checkboxes and buttons labeled "Add to cart" are available in different contexts for this purpose. Any variables you identify in this way will be selected for you when you enter the data extract system. Once in the extract system, you can return to the variable list to make more selections.
Why can't I open the data file? [top]
There are two likely explanations:
1) The data produced by the extract system are gzipped (the file has a .gz extension). You must use a data compression utility to uncompress the file before you can analyze it.
2) You cannot open the data file directly with a statistical package. The file is a simple ASCII file, not a system file in the format of any statistical package. The extract system does, however, generate a syntax (set-up) file to read the ASCII file into your statistical package. You must download the syntax file along with the data file from our server, open the syntax file with your statistical package, and edit the path in the syntax file to point to the location of the data on your local computer. Now you are ready to read in the data.
Is there a preferred statistical package for using the IPUMS? [top]
IPUMS supports SPSS, SAS and Stata. The system does not make data files in those formats, but does generate syntax files with which to read in the ASCII data.
Can I analyze IPUMS-CPS data without a statistical package? [top]
The IPUMS Online Data Analysis System allows users to analyze all IPUMS-CPS samples online. The system performs a wide range of operations data based on specifications made by the user, from simple operations such as tabulations to advanced statistical analyses. Examples and screenshots are available on our short instructions page.
Can I get the original data? [top]
Original CPS data files are located at the following National Bureau of Economic Research's website: nber.org.
How is a record uniquely identified? [top]
Using IPUMS data
Are there tricky aspects of IPUMS data to be particularly aware of? [top]
The IPUMS-CPS samples are weighted: each individual does not represent the same number of persons in the population. It is important to use the weight variables when performing analyses with these samples.
It is important to examine the documentation for the variables you are using. The codes and labels for variable categories do not tell the whole story. In other words, the syntax labels are not enough. There are two things to pay particular attention to. The universe for a variable -- the population at risk for answering the question -- can differ subtly or markedly across samples. Also, read the variable comparability discussions for the samples you are interested in. Important comparability issues should be mentioned there. If a variable is of particular importance in your research (for example, it is your dependent variable), you are also well served to read the enumeration text associated with it. This text is linked directly to the variable, so it is quite easy to call it up.
By default, the extract system rectangularizes the data: it puts the household information on the person records and drops the separate household record. This can distort analyses at the household level. The number of observations will be inflated to the number of person records. You can either select the first person in each household (PERNUM) or select the "hierarchical" box in the extract system to get the proper number of household observations. The rectangularizing feature also drops any vacant households, which are otherwise available in some samples. Despite these complications, the great majority of researchers prefer the rectangularized format, which is why it is the default output of our system.
What are the major limitations of the data? [top]
The data are composed entirely of individual person and household records from population censuses. There are no macroeconomic, business, or aggregate statistics. We do not deliver the published statistics from the population censuses.
IPUMS is composed entirely of sample data, and some subpopulations may be too small to study with the sample data.
Because the data are public-use, measures have been taken to assure confidentiality. Names and other identifying information are suppressed. Most importantly for many researchers, geographic information is limited.
Can I find particular individuals in the IPUMS data? [top]
No. A variety of steps have been taken to ensure the confidentiality of the data. Most fundamentally, the samples do not contain names or addresses. The data are only samples, so there is no guarantee any given individual will be in the dataset.
How do I cite IPUMS-CPS? [top]
Citation information can be found on the IPUMS CPS citation page. Reports and publications using IPUMS-CPS data must be cited appropriately.
Any publications, research reports, presentations, or educational material making use of the data or documentation should be added to our Bibliography. Continued funding for the IPUMS depends on our ability to show our sponsor agencies that researchers are using the data for productive purposes.
Can I use IPUMS for genealogy? [top]
The IPUMS database was not designed for genealogy, and IPUMS-CPS data is limited in that you cannot search for names. Ancestry.com provides information from the census that can be used for genealogical research.
Using the variables page
Variables page menu [top]
Use the "Variables" menu to browse or search variables:
Household: household variables by group
Person: person variables by group
A-Z: integrated variables by letter
Search: display only variables that contain specified text in particular fields
Use the links on the right side of the menu to:
Select Samples: limit the display of variable information to selected samples
Options: alter how the variable list is displayed or get help for this page
Variables page details [top]
The variables page allows you to browse variables while limiting and controlling how the information is displayed.
The "Variables" menu is for browsing the variables. You may also search variables by specifying search terms for specific fields of variable metadata. The system will return a list of variables that include any of the search terms you indicate.
When you "Select Samples" you limit the variable list to display only variables that are available in at least one of those samples. But the effect of selecting samples extends into all the variable descriptions and codes pages you can access through the variable system. Only information relevant to your selected samples will be displayed in any context while you browse the variables. You can change your sample selections at any point.
Selecting samples is a good practice when exploring the IPUMS, because the amount of information can be unwieldy. On the other hand, sometimes you need to see everything to determine what kinds of research are possible using the database.
The final choices are "Options" and "Help." The "Display Options" item brings up a screen that offers a number of choices regarding the display of the variable list. Each selection has a default choice.
View one group / View all groups
Switch between viewing one variable group at a time and viewing all variable groups on one screen. Unless you have a limited number of samples selected, your browser may be slow to display all groups. The default view is one group at a time.
Show availability detail / Show availability summary
Switch between displaying the full sample-specific availability matrix, and a view that only displays the total number of samples that contain each variable. Both views only display or sum the samples that the user has selected in "Select samples." The default view is the detailed availability information.
View available variables / View all variables
Switch between a view that only displays variables present in one of your selected samples, and a view that displays every variable, even if they are not available. The default view is to only display available variables.
Samples are displayed chronologically / Samples . . . reverse chronologically
Display the samples columns indicating variable availability in chronological order (oldest to newest) or reverse chronological order (newest to oldest). The default is reverse chronological (newest to oldest).
The Variable List
As you browse the variables, they are displayed in a list containing a number of columns. The variable name links to the variable description, which includes detailed comparability discussions, universes, and enumeration text. The variable codes -- and their associated labels -- can be accessed directly using the "codes" links. The "type" column indicates if it is a person or household variable. In some contexts, like the alphabetic view, the two types are pooled together.
In the area to the right of the "codes" column is a column for every sample that the user chose in "Select samples." By default, the most commonly requested samples from each year are selected. The country abbreviation and last two digits of the sample year identify each sample at the top of every column. Hover over the year with the mouse to see the full country name. If a variable is available in a given sample, an "x" is printed in that column.
Each variable has a box on the far left in the column labeled "Add to cart." Use these to identify variables you wish to include in a data extract.
Using the data extract system
Your data cart [top]
You must be logged in to use the data extract system. If you are not registered, you must apply for access.
At the top right corner of the variables page is a summary of your data cart. This box displays the number of variables and samples you have selected. Clicking the yellow circle next to a variable places it in your data cart. You can view your data cart at any time by clicking "View Cart." The "View Cart" link only becomes operative when you have selected a variable or sample.
You data cart lists the variables pre-selected by the extract system as well as any variables you selected while browsing the documentation. As with the variable selection page, you can remove variables from your extract in this step by clicking the checkbox next to the variable in the "Add to cart" column. If you chose a variable but subsequently altered your sample selections in such a way that the variable is no longer available, it is indicated by an "i" icon.
The data cart also includes record type, links to codes pages, and sample availability for the variables in your cart.
Buttons are provided to return to the variable list to make more selections or to alter your sample choices. If you return to the variable list, click on "View Cart" again to return to the data cart.
When you are satisfied with your data selections, click "Create Data Extract" to finalize your extract request.
Why are some variables in my data cart preselected? [top]
Certain variables appear in your data cart even if you did not select them, and they are not included in the constantly updated count of variables in your data cart.
Unless you are absolutely certain you will not need one of these variables, we recommend that you not remove them from your data cart.
What is "Type"? [top]
The "Type" column on the variables selection pages and in your data cart indicates the record type of the variable. The variables with a "P" are from the person record, and the variables with an "H" are from the household record. Data at the household level pertain to each person in the household, and are identical on each person record within a household in the rectangular data file.
Extract request page [top]
When you click "Create data extract" in the Data Cart, you come to the Extract Request page. All of the actions on this page are optional. If you wish, you can simply hit the "Submit" button and create your data extract. You will be prompted to log in if have not done so already.
The page summarizes your data extract and provides a number of options for customizing it. A link at the top expands to show the samples you selected. If any samples have notes associated with them, a message will appear on the samples bar to encourage you to review that information. Click the appropriate links to go back to the variable browsing and sample selection pages to alter your choices. You return to the extract request page via the data cart, where you can review the availability matrix for selections and easily drop variables by unchecking them.
A separate link lets you choose the preferred data structure for your extract: rectangular or hierarchical. Rectangular format is the default.
Another row on the page estimates the size of your extract. If the estimated size is too large, click on the link to reduce extract size. One of the methods for reducing the size of extracts involves clicking on the "Select Cases" option button on the lower half of the extract request page.
When you submit an extract, there will be a delay ranging from minutes to hours, depending on the size of the job. You do not need to wait on our site for the job to be completed. Our system will send you an email when your extract is ready.
The definitions of every extract will remain on our server indefinitely, but the data files are subject to deletion after three days. However, the screen where you download extracts has a feature that lets you revise old extracts. When you click on "revise," all your selections for that extract will be loaded into the system, after which you can edit or regenerate it. Note, however, that each successive data release can create difficulties for recreating old extracts, because codes might change.
> Extract definition: Data structure [top]
You can choose the preferred file structure for your extract. Rectangular data only contain person records -- requested household information is attached to each household member. Hierarchical data contains a distinct household record followed by a separate person record for each member of the household. The system defaults to rectangular format, which is the overwhelming choice of researchers.
Vacant housing units can only be extracted using the hierarchical data structure.
Extract option: Select cases [top]
The "select cases" feature allows users to limit their dataset to contain only records with specific values for selected variables, such as persons age 65 and older. Multiple variables can be used in combination during case selection. Selections for multiple variables are additive, each being implicitly connected by a logical "AND" for processing purposes. You can only perform case selection on either the general or the detailed version of a variable, not both.
Simply extracting selected cases can be too crude, however, because you may need the people who co-resided with your selected population. Accordingly, the case selection function also lets you choose to include everyone living in a household with a person with the selected characteristics.
Users should be careful with the case selection feature. It is possible to select a specific variable category (i.e., polygamous marriage) that does not exist across all the samples in your extract, thereby inadvertently excluding those samples from your dataset.
Extract option: Attach characteristics [top]
The data extract system can attach a characteristic of a person's mother, father, or spouse as a new variable on the person's record. It can also attach the characteristics of the household head. For example, using the variable "Occupation," it can make a new variable for "Occupation of mother." All persons in the extract who reside in a household with their mother would receive a value for this new variable. Persons without a mother present in the household would receive a missing value. The extract system automatically generates a unique name for the new variable.
The attached-characteristics feature uses the constructed IPUMS family interrelationship "pointer variables" that identify co-resident mothers, fathers, and spouses for each person. The pointer variables identify social mothers and fathers, not strictly biological parents.
Extract option: Select data quality flags [top]
The data extract system can include data quality flags for the variables they are available.
Extract option: Describe your extract [top]
You can describe your extract for future reference. Our system will display the description on the page where you download your data extract.
What does it mean for a variable to have "multiple variables"? [top]
By selecting one of these container variables, you will add the several variables it represents to your extract. In some cases, you may also be able to see a list of variables included in a container and select only those relevant to you.