Sample Designs

Differences and Similarities between IPUMS-CPS and IPUMS-USA Samples

While IPUMS-USA provides information about the total U.S. population, IPUMS-CPS provides information about the U.S. non-institutionalized population because the ASEC is a probability sample of this population (see "About IPUMS-CPS" for more details). Members of the armed forces who live in off-base housing or on base with their families are included in the ASEC, but persons in the military who reside in military barracks are excluded. Institutionalized persons, such as inmates in old age homes, prisons, and mental institutions, are excluded from the survey. To achieve comparability with IPUMS-CPS, IPUMS-USA users should exclude persons coded as 1, 2, 3, 4, and 6 on the IPUMS-USA variable GQTYPE. Non-institutional group quarters--defined as housing units that are not institutions and that contain nine or more persons unrelated to the householder--are sampled in the CPS.

Like the census samples included in IPUMS-USA, the ASEC datasets included in IPUMS-CPS are samples of households or dwellings. A household is defined as all persons who occupy a dwelling unit. A dwelling unit is a room or group of rooms intended for occupation as separate living quarters and having either a separate entrance or complete cooking facilities for the exclusive use of the occupants. These definitions are consistent with the definitions of households and dwelling units used in recent U.S. censuses. The provision of data about multiple individuals within the same household allows analysis of such topics as household composition, nuptiality, and the relative earnings of husbands and wives.

For the CPS, information is always collected by a trained interviewer, during face-to-face or telephone interviews of household members. In recent U.S. censuses, households are mailed census questionnaires, and household members fill in the forms themselves. Enumerators contact only the minority of households that do not send back completed census forms.

Housing units that were vacant or could not be interviewed (due to refusals to participate or absence of the residents) are included in the IPUMS-CPS data beginning in 1988. Such vacant and non-interview units have a weight of zero in the household weight (HWTSUPP). Vacant households are also included in the IPUMS-USA database beginning with the 1970 census. They can be identified with the VACANT variable and should be excluded from analysis for statistics comparable to the weighted figures from IPUMS-CPS.

Because the CPS is designed to measure unemployment in the civilian labor force, members of the armed forces are not part of the universe for many employment-related questions in the ASEC. Persons in the military provide demographic information, answer questions about their migration histories, and provide data about their incomes and primary jobs during the preceding calendar year. The census samples in IPUMS-USA do not treat persons in the military differently than civilian adults. Users who wish to work simultaneously with data from IPUMS-CPS and IPUMS-USA are strongly urged to read the universe restrictions and comparability issues discussed in the variable descriptions.

The application of comparable coding schemes for IPUMS-CPS and IPUMS-USA is designed to facilitate time-series analysis. Sample sizes in IPUMS-CPS are considerably smaller than in IPUMS-USA, but observations are available for every year, rather than at ten-year intervals. Users should not combine observations from the same year (1970, 1980, 1990, or 2000) from the two databases.

CPS Sample Design

The CPS samples are multi-stage stratified samples. The first stage of sampling involves dividing each U.S. state into "primary sampling units" (PSUs), most of which comprise a metropolitan area, a large county, or a group of smaller adjacent counties. The CPS consists of independent samples in each state and the District of Columbia. Within each state, the PSUs are grouped into homogenous strata with respect to labor force and other social and economic characteristics that are highly correlated with unemployment. One PSU is sampled per stratum, where the probability of selection for each PSU in the stratum is proportional to its population.

In the second stage of sampling, a systematic sample of housing units is drawn from within each chosen PSU. Addresses for housing units are taken from sources such as lists of addresses obtained from the decennial censuses and building permits. "Ultimate sampling units" (USUs) are clusters of about four housing units. Usually, all households in the USU are in the sample. Occasionally, a third stage of sampling is necessary when actual USU size is extremely large. The multi-stage stratified sampling method is roughly equivalent to dividing the entire United States into USUs and selecting a clustered sample of these USUs for interviewing. Hence, the CPS sample is also a cluster sample.

The monthly CPS is a rotating panel design; households are interviewed for four consecutive months, are not in the sample for the next eight months, and then are interviewed for four more consecutive months. The point in the rotation at which a household is interviewed, or the household's "Month in Sample" for a given interview, is indicated by MISH. The rotating panel design means that for each month of the CPS, 50 percent of households are in the CPS during the same month one year earlier and the other 50% of households are in the CPS in the same month one year later. Any CPS household is in the survey up to 8 times over a 16 month period. There is no overlap for longer time intervals.

Beginning in 1976, the ASEC includes an oversample of Hispanics to increase the reliability of estimates for this group. Approximately twice as many Hispanics are interviewed than would be in the sample if it was exactly proportional to the U.S. population. Each Hispanic person represents a smaller number of individuals than each non-Hispanic person. The use of weights, discussed below, corrects for this oversampling to yield representative national statistics from IPUMS-CPS.

In 2002, in order to improve state estimates of children's health insurance coverage, the ASEC underwent a sample expansion. In addition to increasing the monthly CPs sample in states with high sampling errors for uninsured children, this expansion involved asking the ASEC supplement questions of one quarter of the February and April CPS samples that is, of the households not also included in the March sample (see above for discussion of Month in Sample) and interviewing selected sample households from the preceding November CPS sample during the February-April period using the ASEC supplement . Even though the data in the ASEC is collected in several different months beginning in 2002, it is generally referred to as the March supplement, though in IPUMS-CPS we refer to it as the ASEC and is listed as such on the "select samples" page of the extract system. Basic monthly survey data for the month of March are also available. Read further for more information about the relationship between the March basic and the ASEC.

CPS Weights

Due to the complex sampling design for the CPS, users of IPUMS-CPS data must make use of weights to produce representative statistics.

Most analyses based on individual-level ASEC data should use the WTSUPP variable. WTSUPP is based on the inverse probability of selection into the sample and adjustments for the following factors: failure to obtain an interview; sampling within large sample units; the known distribution of the entire population according to age, sex, and race; over-sampling Hispanic persons; to give husbands and wives the same weight; and an additional step to provide consistency with labor force estimates from the basic survey. WTSUPP is the person-level weight that is available for questions that were not part of the basic monthly survey questions asked every month in the CPS.

Analysts using non-ASEC data or combining March and non-ASEC data should use the variable WTFINL to weight their data. EARNWT should be used with a small number of variables, specifically, EARNWEEK, HOURWAGE, PAIDHOUR, and UNION.

For analyses of the ASEC focused on household-level variables, researchers should use the household weight, HWTSUPP. HWTSUPP generally has the same value as WTSUPP for the household head or reference person. As noted above, vacant housing units and households that could not be interviewed due to residents' absence or refusal to participate have a value of zero in HWTSUPP.

Starting in 2005, the Census Bureau calculated household and person weights using more age detail for children. These calculations, which produce the official weights used in the ASEC for 2005 and later, provide better estimates of children by single year of age. The Census Bureau also recalculated the weights for the 2004 ASEC based on these changes. For the 2004 ASEC, IPUMS-CPS makes available both the original weights (HWTSUPP, WTSUPP) and the new weights (HHWT04, PERWT04).

Back to Top