CPS Income and Tax Variables User's Note: Missing Cases, N.I.U. Cases, Top Codes and Bottom Codes
Overall Structure (Consult the codes page for each variable before use):
99998 or 9999998 = Missing
99999 or 9999999 = N.I.U.
-9997 = Bottom Code
Missing cases, where applicable, are identified in CPS Income and Tax variables (as they are elsewhere in the CPS), by a notation of mostly "9's," where the last digit is an "8." Missing cases are those that meet the universe criteria, but for some reason or another, do not have legitimate values. Missing cases may have been determined through original CPS coding or through IPUMS coding during data harmonization across survey years.
N.I.U. stands for "Not in Universe." These are cases that do not meet the universe criteria for particular variables. In CPS Income and Tax variables, the most common criterion for determining N.I.U. status is the age of the respondent, but there are others (consult the universe statements for each variable before use for reference). N.I.U. cases are identified in CPS Income and Tax variables (as they are elsewhere in the CPS), by a notation of all "9's." N.I.U. cases may have been determined through original CPS coding. However, original CPS coding often grouped "meaningful zero" cases (i.e. those meeting universe criteria, but that have no Income or Tax values for the specific variable being examined) along with N.I.U. cases together with a value of "zero." IPUMS coding therefore separated these N.I.U. cases out from other zero-value meaningful cases, giving the N.I.U. cases an all "9's" value. As a result, most N.I.U. cases where determined through IPUMS coding during data harmonization across survey years.
Top Codes represent, where applicable, a determination by the CPS that some high values were too sparse and specific to be recorded as they were reported to the CPS without the possibility of identifying the respondents. In these cases, the CPS put numerous high value cases together under one particular high value to protect respondent anonymity. For most variables, and for most years, IPUMS coding has retained the original coding utilized by the CPS in this regard. Topcodes vary by income variable and by year. Values above the topcode have also been given different treatments by the Census Bureau over time.
For the years 1962 until 1995, the values exceeding the topcode threshold are simply recoded with the threshold. For example, all responses for INCWAGE greater than or equal to 50,000 in the 1976 March CPS Survey were replaced with 50,000 in the public use Census data. In most cases, IPUMS retains this coding. IPUMS coding does not identify these as topcodes.
Topcodes Tables: Though often topcodes are not always identified with a value of 99997, topcode thresholds by year are extensively documented in our Topcodes Tables page.
There are instances when IPUMS Top Codes were imposed on specific variables, during specific years. This occurred for a number of reasons. At times, original CPS coding had put Missing, N.I.U. and Top Code cases together under the same value. At other times, IPUMS coding of Missing or N.I.U. cases would have resulted in Missing, N.I.U. and Top Codes cases being classified together under the same value. In these instances, IPUMS coding imposed a Top Code value by a notation of mostly "9's," where the last digit is a "7.". For example, in instances where the topcode value is 99,999 in a 5-digit numeric variable, IPUMS recodes the value to 99997 to avoid confusion with our NIU codes. With the exception of the variable, EARNWEEK, IPUMS code notation only identifies Top Codes that have been imposed by IPUMS coding itself.
Starting in 1996, the Census Bureau introduced replacement values to take the place of topcoded values. Topcoded individuals are divided into twelve groups depending on characteristics such as race, gender, and full time status. Income values are reassigned according to the mean income within each group. If less than 5 individuals are topcoded within a characteristic group, groups are pooled and a new average is given to each group. You can find the groups and respective replacement values by year on the Topcodes Tables page.
Income Cell Means: A 2008 paper generated replacement values using restricted use CPS data from 1976 to 2002. This data, including IPUMS identifiers, can be downloaded from this page.
In 2011, the income topcoding system changed again. According to the Census Bureau's documentation, all incomes above the topcode are rounded to two significant digits and then exchanged among individuals within a bounded interval. This is called a "rank proximity procedure".
Bottom Codes values in the CPS, where applicable, operate under the same philosophy as that described for Top Codes values. This means that most cases of Bottom Codes are those determined by original CPS coding and left for users to determine through their own examination of the data. There are some instances where, either to avoid potential confusion caused by awkward implementation of original CPS coding, or to create consistency with IPUMS-imposed Top Codes, IPUMS has imposed Bottom Codes. These imposed Bottom Codes, where applicable, are identified in CPS Income and Tax variables, by a notation of mostly negative "9's," where the last digit is a "7." Therefore all Bottom Codes cases that utilize the notation of mostly negative "9's," where the last digit is a "7" are the result of IPUMS coding during data harmonization across survey years.
As with the IPUMS-imposed Top Codes, when Bottom Code notation is identified in the Variable Description Codes notation, imposed Bottom Codes may exist only in particular years.