IPUMS-CPS Unharmonized Variables

What are unharmonized variables?

IPUMS-CPS unharmonized variables are original CPS data packaged for accessibility and utility. Regular IPUMS-CPS variables are harmonized for comparability across time. Similar concepts are given consistent codes across months and years, unknown and NIU cateogires are coded consistently, and any unexpected values are recoded. IPUMS-CPS unharmonized variables correspond directly to the original public use datasets made available by the Census Bureau and the Bureau of Labor Statistics. IPUMS-USA and IPUMS-International users may be familiar with the source variables avaialble from these IPUMS products. Like source variables, IPUMS-CPS unharmonized variables deliver original data but, rather than being unique to each sample, one unharmonized variable is available for every sample in which a given original CPS variable has identical codes and value labels.

The Current Population Survey is fielded monthly, and variables rarely change from month to month. However, the large number of samples and inconsistent variable names across time make use of the source data cumbersome. IPUMS-CPS has created unharmonized variables to ease browsing and use of CPS source data. Unharmonized variables are (mostly) un-recoded variables that apply a consistent name to variables that have the same meaning and codes across multiple samples. IPUMS-CPS unharmonized variables are denoted with a "UH_" prefix and a "_[number]" suffix. When codes are added or removed or have different labels between years, a new unharmonized variable is created, and the suffix increments by one.

For example, the variable that indicates whether the respondent worked for 35 hours or more during the week at a given job exists in all CPS months from 1976-2009. However, it is called I20 from 1976-1988, A-USLFT from 1989-1993, and PEHRFTPT from 1994-2009. Codes are also inconsistent across time. NIU is indicated by a blank space from 1976-1993 and by -1 in 1994-2009. In 1994, a new value was introduced to indicate that hours at work varied.

Usually work 35 or more hours a week
1976-1988 1989-1993 1994-2009
NIU ' ' ' ' -1
Yes 1 1 1
No 2 2 2
Hours vary 3

IPUMS-CPS synthesizes this information into two different unharmonized variables: UH_USLFT_1, which is available from 1976-1993, and UH_USLFT_2, available from 1994-2009. These variables bundle the source variables from all samples which have the same possible codes into two easily-selected variables.

Recoding in Unharmonized Variables

Unharmonized variables are intended deliver raw CPS data, however, IPUMS-CPS has done some recoding to increase the useability of the data. These recodes deal with blank spaces in the original data, unexpected strings in the original data, and meaningful strings that would force the variable to be string type.

Blank spaces and unexpected values

We recode string representations of missing values in variables that are otherwise of numeric type. Consider UH_USLFT_1 from the example above. In the original data, "Not in universe" is designated by a blank space. These records have been recoded to have a value of -9, "Missing" in UH_USLFT_1 so that all cases in the data have a numeric value.

Recoding in UH_USLFT_1
Input Value Input Label Output Value Output Label
' ' NIU -9 Missing
1 Yes 1 Yes
2 No 2 No

Unexpected Strings

We use a similar approach to recode meaningless strings. In the CPS source data from years prior to 1989, low-frequency junk strings occasionally show up in the data. When this occurs, we assign these values to the IPUMS-CPS-created "Missing" category. We note these recodes in the unharmonized variable descriptions.

There are also occasionally numeric values in the source data that are not given a label in any of the original CPS documentation. We leave these unexpected values un-recoded and un-labeled and, as a result, they may not appear in the 'Codes' tab of a variable page.

Meaningful Strings

There are several variables that have both numeric and meaningful string values. We recode the meaningful strings to numeric values so that the variable can be treated as numeric. For example, UH_FAMINCX_1 contains meaningful string values.

Recoding in UH_FAMICNX_1
Input Value Input Label Output Value Output Label
0 Under $5,000 0 Under $5,000
1 $ 5,000-7,499 1 $ 5,000-7,499
2 $ 7,500- 9,999 2 $ 7,500- 9,999
3 $ 10,000-12,499 3 $ 10,000-12,499
4 $ 12,500-14,999 4 $ 12,500-14,999
5 $ 15,000-17,499 5 $ 15,000-17,499
6 $ 17,500-19,999 6 $ 17,500-19,999
7 $ 20,000-24,999 7 $ 20,000-24,999
8 $ 25,000-29,999 8 $ 25,000-29,999
9 $ 30,000-34,999 9 $ 30,000-34,999
A $ 35,000-39,999 10 $ 35,000-39,999
B $ 40,000-49,999 11 $ 40,000-49,999
C $ 50,000-74,999 12 $ 50,000-74,999
D $ 75,000 and over 13 $ 75,000 and over
' ' NIU -99 Missing
- -99 Missing


Some variables in some samples are missing in the original data files, contradicting the original metadata. In these instances, the unharmonized variable is available for all samples for which the original metadata suggests it should exist, but is entirely missing in samples where the original data is not meaningful. For example, according to original documentation, the variable for "Relationship to owner of business" (UH_BUSOWN_1) should be available from 1994 to July of 2005. However, the columns specified for this variable in the data files from 1999 to July of 2005 contain only -1. These samples are still listed as available in UH_BUSOWN_1 even though they are entirely missing.

Back to Top