Differences and Similarities between IPUMS-CPS
and IPUMS-USA Samples
While IPUMS-USA provides information about the
total U.S. population, IPUMS-CPS provides information
about the U.S. non-institutionalized population
because the March CPS is a probability sample
of this population (see "
About IPUMS-CPS" for more details). Members of the
armed forces who live in off-base housing or on base with their families are
included in the March CPS, but persons in the military who reside in military
barracks are excluded. Institutionalized persons, such as inmates in old age
homes, prisons, and mental institutions, are excluded from the survey. To achieve
comparability with IPUMS-CPS, IPUMS-USA users should exclude persons coded
as 1, 2, 3, 4, and 6 on the IPUMS-USA variable
GQTYPE. Non-institutional group quarters--defined
as housing units that are not institutions and that contain nine or more persons
unrelated to the householder--are sampled in the CPS.
Like the census samples included in IPUMS-USA, the March CPS datasets included
in IPUMS-CPS are samples of households or dwellings. A household is defined as
all persons who occupy a dwelling unit. A dwelling unit is a room or group of
rooms intended for occupation as separate living quarters and having either a
separate entrance or complete cooking facilities for the exclusive use of the
occupants. These definitions are consistent with the definitions of households
and dwelling units used in recent U.S. censuses. The provision of data about
multiple individuals within the same household allows analysis of such topics
as household composition, nuptiality, and the relative earnings of husbands and
wives.
For the CPS, information is always collected by a trained interviewer, during
face-to-face or telephone interviews of household members. In recent U.S. censuses,
households are mailed census questionnaires, and household members fill in the
forms themselves. Enumerators contact only the minority of households that do
not send back completed census forms.
Housing units that were vacant or could not be interviewed (due to refusals to
participate or absence of the residents) are included in the IPUMS-CPS data beginning
in 1988. Such vacant and non-interview units have a weight of zero in the household
weight (HHWT). Vacant households are also included in the IPUMS-USA database
beginning with the 1970 census. They can be identified with the VACANT variable
and should be excluded from analysis for statistics comparable to the weighted
figures from IPUMS-CPS.
Because the CPS is designed to measure unemployment in the civilian labor force,
members of the armed forces are not part of the universe for many employment-related
questions in the March CPS. Persons in the military provide demographic information,
answer questions about their migration histories, and provide data about their
incomes and primary jobs during the preceding calendar year. The census samples
in IPUMS-USA do not treat persons in the military differently than civilian adults.
Users who wish to work simultaneously with data from IPUMS-CPS and IPUMS-USA
are strongly urged to read the universe restrictions and comparability issues
discussed in the variable descriptions.
The application of comparable coding schemes for IPUMS-CPS and IPUMS-USA is designed
to facilitate time-series analysis. Sample sizes in IPUMS-CPS are considerably
smaller than in IPUMS-USA, but observations are available for every year, rather
than at ten-year intervals. Users should not combine observations from the same
year (1970, 1980, 1990, or 2000) from the two databases.
CPS Sample Design
The CPS samples are multi-stage stratified samples. The first stage of sampling
involves dividing each U.S. state into "primary sampling units" (PSUs), most
of which comprise a metropolitan area, a large county, or a group of smaller
adjacent counties. The CPS consists of independent samples in each state and
the District of Columbia. Within each state, the PSUs are grouped into homogenous
strata with respect to labor force and other social and economic characteristics
that are highly correlated with unemployment. One PSU is sampled per stratum,
where the probability of selection for each PSU in the stratum is proportional
to its population.
In the second stage of sampling, a systematic sample of housing units is drawn
from within each chosen PSU. Addresses for housing units are taken from sources
such as lists of addresses obtained from the decennial censuses and building
permits. "Ultimate sampling units" (USUs) are clusters of about four housing
units. Usually, all households in the USU are in the sample. Occasionally,
a third stage of sampling is necessary when actual USU size is extremely large.
The multi-stage stratified sampling method is roughly equivalent to dividing
the entire United States into USUs and selecting a clustered sample of these
USUs for interviewing. Hence, the CPS sample is also a cluster sample.
The monthly CPS is a rotating panel design; households are interviewed for four
consecutive months, are not in the sample for the next eight months, and then
are interviewed for four more consecutive months. This means that for the March
CPS, 50 percent of households are in two adjacent years of data. There is no
overlap for longer time intervals. This preliminary release of the IPUMS-CPS
does not include the information needed to follow sampled households over time;
later releases of the database will provide the information needed to link households
across two adjoining years.
Beginning in 1976, the March CPS includes an oversample of Hispanics to increase
the reliability of estimates for this group. Approximately twice as many Hispanics
are interviewed than would be in the sample if it was exactly proportional to
the U.S. population. Each Hispanic person represents a smaller number of individuals
than each non-Hispanic person. The use of weights, discussed below, corrects
for this oversampling to yield representative national statistics from IPUMS-CPS.
CPS Weights
Due to the complex sampling design for the CPS, users of IPUMS-CPS data must
make use of weights to produce representative statistics.
Most analyses based on individual-level data should use the
PERWT variable. PERWT
is based on the inverse probability of selection into the sample and adjustments
for the following factors: failure to obtain an interview; sampling within large
sample units; the known distribution of the entire population according to age,
sex, and race; over-sampling Hispanic persons; to give husbands and wives the
same weight; and an additional step to provide consistency with labor force estimates
from the basic survey. PERWT is the person-level weight that is available for
questions that were not part of the basic monthly survey questions asked every
month in the CPS.
If analysts wish to reproduce the monthly labor force statistics
published by the Bureau of Labor Statistics, they should instead
use the variable
BLSWT
to weight their data. For most other analyses using person-level
data, however,
PERWT
is the appropriate choice.
EARNWT
should be used with a small number of variables, specifically,
EARNWEEK,
HOURWAGE,
PAIDHOUR,
and
UNION.
For analyses focused on household-level variables, researchers should use the
household weight,
HHWT.
HHWT generally has the same value as PERWT for the household
head or reference person. As noted above, vacant housing units and households
that could not be interviewed due to residents' absence or refusal to participate
have a value of zero in HHWT.
Starting in 2005, the Census Bureau calculated household and person
weights using more age detail for children. These calculations,
which produce the official weights used in the March supplements
for 2005 and later, provide better estimates of children by single
year of age. The Census Bureau also recalculated the weights for
2004 based on these changes. For 2004, IPUMS-CPS makes available
both the original weights (HHWT, PERWT) and the new weights (
HHWT04,
PERWT04).