Sample Notes
The samples and variables included in the March Current Population Survey (CPS)
vary over time. This document notes important characteristics of and changes
in the samples of CPS data included in IPUMS-CPS. For an overview of the sampling
strategy, see sample designs .
1962-1967: Children under age 14 are not included in the samples in these
years. These datasets were not officially released by the U.S. Census Bureau
as public use files. Because the datasets were used by researchers at the University
of Wisconsin, they were preserved in the data archive at the Center for Demography
and Ecology at the University of Wisconsin. Documentation for these files is
particularly sparse.
1962: The original 1962 dataset lacked "NIU" (not in universe)
codes for individuals outside the universe for numerous variables. IPUMS-CPS
has imposed such codes. In many cases (e.g., veteran status), a variable was
supposedly available in the original data for 1962, but the codes were not
reliable and the variable was excluded from IPUMS-CPS. For a few variables,
such as relationship to household head, more than one variable appeared to
cover the same material in the original dataset. Through diagnostic cross-tabs
(e.g., cross-tabulating marital status with relationship to household head),
we identified the most consistent variable and included it in IPUMS-CPS.
1963: 1600 records have a code of "62" for the original "year" variable
in the 1963 CPS dataset. Usually one record within a household, rather than all
members of the household, had the value "62." On the basis of careful examination
of many cases, we decided to leave these cases in the data set and coded them
as "1963" in the IPUMS-CPS YEAR variable. In the original data set, these 1600
cases were truncated, with blanks for many variables relating to work and earnings
during the previous calendar year. For such variables, the 1600 truncated records
were assigned the codes for "Not in universe" (NIU). These "1962" cases in the 1963 dataset can be identified using the REPORTYR variable.
Given the absence of children under 14 and the institutionalized population,
one would expect the weighted counts of the CPS for 1963 to equal about two-thirds
of the total U.S. population. In fact, the original weighted population count for the 1963
March CPS, using individual-level records, was only about half as large as
the U.S. population total for 1963. The cases present in the 1963
sample were representative of the 1963 U.S. population in every way we could measure. For this
reason, we adjusted all original weighting values by a constant (1.3262), so that the weighted totals
for the 1963 dataset accurately reflect the absolute number of persons with
any given characteristic in the U.S. adult, non-institutionalized population.
1966: The number of cases in the 1966 dataset is almost twice as large
as in any other March CPS sample prior to 1968. The original weighted population count
for the 1966 March CPS, using individual-level records, is about twice as large
as the expected non-institutionalized U.S. population age 14+ in that year. IPUMS-CPS
created a revised weight for 1966, multiplying all original weighting values by a constant (0.5043).
1967: The original weighted population count for the 1967 March CPS, using individual-level
records, was only about half as large as the U.S. population total for 1967. Since the cases present were
representative of the 1967 population in every measurable way, IPUMS-CPS
created a revised weight for 1967. The revised weight multiplies all original weighting values by a constant (1.5333).
1968-1975: These datasets were not officially released by the U.S. Census
Bureau as public use files. Children under 14 were included in the March CPS
datasets beginning in 1968. The relationships of children and of members of
the Armed Forces to the household head was left blank in the original data;
IPUMS-CPS codes such persons as "Under 14, relationship unknown" and "Armed forces, relationship
unknown" for the RELATE variable.
1968: This dataset is the first in IPUMS-CPS to use the full set of occupation
and industry codes generally used by the Census Bureau.
1976: The 1976 March CPS file is the first dataset to include household-level
records in the original data. For earlier files, IPUMS-CPS created a household-level
record using the record of the household head. As with earlier CPS data files
included in IPUMS-CPS, the format of the original data from the Census Bureau
was modified by programmers at the University of Wisconsin.
An oversample of Hispanic persons was first done for the 1976 March survey. The
person-level weights, PERWT and BLSWT, and the household-level weight, HHWT,
correct for this oversample, so that the number of Hispanic persons and households
for weighted totals is consistent with the number of such persons in the non-institutionalized
U.S. population.
The March 1976 dataset, and datasets for subsequent years, include two different
individual-level weights: PERWT and BLSWT. In prior years, only PERWT is available.
For most purposes, researchers should rely on PERWT. PERWT must be used for analytic
purposes to produce statistics representative of the non-institutionalized population
of the United States. If they wish to replicate published BLS statistics relating
to variables included in the basic monthly survey repeated each month in the
CPS, analysts should use BLSWT.
1977: The 1977 dataset included in IPUMS-CPS is the first dataset that
was released as a public use file by the Census Bureau and was not modified in
its original data formatting by programmers at the University of Wisconsin.
1980: Up through 1979, persons age 14 and older were considered adults;
beginning in 1980, persons age 15 and older were classified as adults. The lowest
age limit of the universes for income variables and for many variables relating
to employment rose from 14 to 15 beginning in 1980. Exceptions are the ABSENT, CLASSWKR, EMPSTAT, LABFORCE, LOOKING, OCC, and OCC1950 variables; these used
age 14 as the youngest age group in the variable universe through 1987.
1988: As noted directly above, the universe for some work-related variables
(ABSENT, CLASSWKR, EMPSTAT, LABFORCE, LOOKING, OCC, and OCC1950) changed from
age 14+ to age 15+ beginning in 1988.
Particularly notable is the increase in 1988 in the number of variables relating
to income from specific sources during the previous calendar year. The 1988 dataset
is the first to include separate variables on income from the following sources:
unemployment insurance; workers' compensation; veterans' benefits; disability
income; dividends; rent; educational assistance; child support; alimony; and
personal assistance from persons outside the household. Prior to 1988, income
from these sources was subsumed into a smaller number of income variables with
a broader focus.
Past U.S. military service was first reported by civilian women in the 1988 March
CPS data. Data for the VETSTAT and VETLAST variables are available only for civilian
men in earlier years.
More detailed information on the relationship of unrelated persons to the householder
is available beginning in 1988, with the inclusion of "partner/roommate" and "foster
child" in the codes for RELATE.
1994: A major redesign of the Current Population Survey was implemented
in 1994. One aspect of the redesign was changes in question wording. The new
wording reduced underreporting of labor force participation by women working
part-time and more precisely measured the number of persons on temporary lay-off
from jobs. A second aspect of the redesign was that CPS interviewers switched
from using paper questionnaires to computer-assisted interviewing technology
with skip patterns programmed into the interview format.
The 1994 March survey added new questions about the date of immigration for foreign-born
persons and about the birthplaces of each respondent's mother and father.
1995: The computer assisted-interviewing format instituted in 1994 facilitated
another change--allowing respondents to report income from various sources for
a number of short periods (e.g., bi-weekly or monthly) rather than as a lump
sum for the previous calendar year--beginning in 1995.
New response categories for the relationship to the householder were put in place
in 1995. The partner/roommate category was dropped and replaced with unmarried
partner, housemate/roommate, and roomer/boarder in the RELATE variable.
2001: A number of questions designed to measure participation in welfare
reform programs (e.g., job training, transportation assistance) were first included
in the March 2001 survey.
2003: For the first time, multiple race responses were allowed. Hispanic
origin was ascertained through two questions, rather than the single question
used in earlier years. The occupation and industry coding schemes of Census 2000
were adopted, with minor modifications.
2004 to 2007: See the Revision History page.
|