DATA
Create an Extract
Download or Revise Extracts
Analyze data online
Register as a New User

DOCUMENTATION
What is IPUMS-CPS?
User's Guide
Variables
Samples

RESEARCH
Citation and Use
Bibliography
Related Sites
Revision History

CONTACT US
  Feedback
IPUMS Staff
How to Help

Sample Designs

Differences and Similarities between IPUMS-CPS and IPUMS-USA Samples

While IPUMS-USA provides information about the total U.S. population, IPUMS-CPS provides information about the U.S. non-institutionalized population because the March CPS is a probability sample of this population (see "About IPUMS-CPS" for more details). Members of the armed forces who live in off-base housing or on base with their families are included in the March CPS, but persons in the military who reside in military barracks are excluded. Institutionalized persons, such as inmates in old age homes, prisons, and mental institutions, are excluded from the survey. To achieve comparability with IPUMS-CPS, IPUMS-USA users should exclude persons coded as 1, 2, 3, 4, and 6 on the IPUMS-USA variable GQTYPE. Non-institutional group quarters--defined as housing units that are not institutions and that contain nine or more persons unrelated to the householder--are sampled in the CPS.

Like the census samples included in IPUMS-USA, the March CPS datasets included in IPUMS-CPS are samples of households or dwellings. A household is defined as all persons who occupy a dwelling unit. A dwelling unit is a room or group of rooms intended for occupation as separate living quarters and having either a separate entrance or complete cooking facilities for the exclusive use of the occupants. These definitions are consistent with the definitions of households and dwelling units used in recent U.S. censuses. The provision of data about multiple individuals within the same household allows analysis of such topics as household composition, nuptiality, and the relative earnings of husbands and wives.

For the CPS, information is always collected by a trained interviewer, during face-to-face or telephone interviews of household members. In recent U.S. censuses, households are mailed census questionnaires, and household members fill in the forms themselves. Enumerators contact only the minority of households that do not send back completed census forms.

Housing units that were vacant or could not be interviewed (due to refusals to participate or absence of the residents) are included in the IPUMS-CPS data beginning in 1988. Such vacant and non-interview units have a weight of zero in the household weight (HHWT). Vacant households are also included in the IPUMS-USA database beginning with the 1970 census. They can be identified with the VACANT variable and should be excluded from analysis for statistics comparable to the weighted figures from IPUMS-CPS.

Because the CPS is designed to measure unemployment in the civilian labor force, members of the armed forces are not part of the universe for many employment-related questions in the March CPS. Persons in the military provide demographic information, answer questions about their migration histories, and provide data about their incomes and primary jobs during the preceding calendar year. The census samples in IPUMS-USA do not treat persons in the military differently than civilian adults. Users who wish to work simultaneously with data from IPUMS-CPS and IPUMS-USA are strongly urged to read the universe restrictions and comparability issues discussed in the variable descriptions.

The application of comparable coding schemes for IPUMS-CPS and IPUMS-USA is designed to facilitate time-series analysis. Sample sizes in IPUMS-CPS are considerably smaller than in IPUMS-USA, but observations are available for every year, rather than at ten-year intervals. Users should not combine observations from the same year (1970, 1980, 1990, or 2000) from the two databases.

CPS Sample Design

The CPS samples are multi-stage stratified samples. The first stage of sampling involves dividing each U.S. state into "primary sampling units" (PSUs), most of which comprise a metropolitan area, a large county, or a group of smaller adjacent counties. The CPS consists of independent samples in each state and the District of Columbia. Within each state, the PSUs are grouped into homogenous strata with respect to labor force and other social and economic characteristics that are highly correlated with unemployment. One PSU is sampled per stratum, where the probability of selection for each PSU in the stratum is proportional to its population.

In the second stage of sampling, a systematic sample of housing units is drawn from within each chosen PSU. Addresses for housing units are taken from sources such as lists of addresses obtained from the decennial censuses and building permits. "Ultimate sampling units" (USUs) are clusters of about four housing units. Usually, all households in the USU are in the sample. Occasionally, a third stage of sampling is necessary when actual USU size is extremely large. The multi-stage stratified sampling method is roughly equivalent to dividing the entire United States into USUs and selecting a clustered sample of these USUs for interviewing. Hence, the CPS sample is also a cluster sample.

The monthly CPS is a rotating panel design; households are interviewed for four consecutive months, are not in the sample for the next eight months, and then are interviewed for four more consecutive months. This means that for the March CPS, 50 percent of households are in two adjacent years of data. There is no overlap for longer time intervals. This preliminary release of the IPUMS-CPS does not include the information needed to follow sampled households over time; later releases of the database will provide the information needed to link households across two adjoining years.

Beginning in 1976, the March CPS includes an oversample of Hispanics to increase the reliability of estimates for this group. Approximately twice as many Hispanics are interviewed than would be in the sample if it was exactly proportional to the U.S. population. Each Hispanic person represents a smaller number of individuals than each non-Hispanic person. The use of weights, discussed below, corrects for this oversampling to yield representative national statistics from IPUMS-CPS.

CPS Weights

Due to the complex sampling design for the CPS, users of IPUMS-CPS data must make use of weights to produce representative statistics.

Most analyses based on individual-level data should use the PERWT variable. PERWT is based on the inverse probability of selection into the sample and adjustments for the following factors: failure to obtain an interview; sampling within large sample units; the known distribution of the entire population according to age, sex, and race; over-sampling Hispanic persons; to give husbands and wives the same weight; and an additional step to provide consistency with labor force estimates from the basic survey. PERWT is the person-level weight that is available for questions that were not part of the basic monthly survey questions asked every month in the CPS.

If analysts wish to reproduce the monthly labor force statistics published by the Bureau of Labor Statistics, they should instead use the variable BLSWT to weight their data. For most other analyses using person-level data, however, PERWT is the appropriate choice. EARNWT should be used with a small number of variables, specifically, EARNWEEK, HOURWAGE, PAIDHOUR, and UNION.

For analyses focused on household-level variables, researchers should use the household weight, HHWT. HHWT generally has the same value as PERWT for the household head or reference person. As noted above, vacant housing units and households that could not be interviewed due to residents' absence or refusal to participate have a value of zero in HHWT.

Starting in 2005, the Census Bureau calculated household and person weights using more age detail for children. These calculations, which produce the official weights used in the March supplements for 2005 and later, provide better estimates of children by single year of age. The Census Bureau also recalculated the weights for 2004 based on these changes. For 2004, IPUMS-CPS makes available both the original weights (HHWT, PERWT) and the new weights (HHWT04, PERWT04).