******************************************************* *This do file is a solution to the Validating Links *exercise from the 2018 IPUMS-CPS Summer Data Workshop * *Written by the IPUMS-CPS team *31 May 2018 ******************************************************* qui{ cap log close log using logs/04a_linking_possibilities_validated.log, t replace clear set more off /*Questions 1-3 of this exercise will be answered with the cps rotation exploration station*/ //veterans and volunteers data cd data qui do cps_00203.do noisily: di "...now we have IPUMS data ready to go" noisily: list cpsidp mish age sex in 1/10 noisily: di "...generate index and count variables to find how many times an individual appears in the data" bysort cpsidp (mish) : gen time = _n egen count=max(time),by(cpsidp) noisily: list year month time in 1/10 noisily: tab count month, col noisily: di "...drop records that don't successfully link" drop if count==1 noisily: di "...and now we have only successful, mechanical links" noisily: count noisily: di "...now we can see how month-in-sample values change over time" noisily: tab mish month egen next_mis = max(mish),by(cpsidp) /*Question 4*/ noisily: tab mish next_mis if time==2 noisily: di "...now that we have successful mechanical links, we need to validate based on age, sex, and race" qui do ../validate_long.txt 2 /*Question 6*/ noisily: di "...the variable allowable_age_diff is created in the validate_long do file" noisily: tab allowable_age_diff /*Question 7*/ noisily: di "...we can break down links by individual demographic variables" noisily: tab age_total_match if time == 2 noisily: tab sex_total_match if time == 2 noisily: tab race_total_match if time == 2 noisily: tab all_match if time == 2 noisily: di "...let's keep just the matches that are valid on all three demographic characteristics" drop if all_match !=1 noisily: tab mish next_mis if time==2 noisily: di "...this is the number of valid links" noisily: count clear //now the 8-month panel noisily: di "Now let's return to the 8-month panel" qui do cps_00224.do noisily: di "...generate index and count variables to find how many times an individual appears in the data" bysort cpsidp (month year) : gen time = _n egen count=max(time),by(cpsidp) noisily: tab count noisily: di "...note that in practice less than 12.5% of the original cross section make it through all 8 interviews" noisily: tab count month, col noisily: di "...keep only the records that appear in all 8 interviews" keep if count == 8 noisily: count /*Question 8*/ noisily: di "...now that we have successful mechanical links, we need to validate based on age, sex, and race" qui do ../validate_long.txt 8 noisily: di "...we can break down links by individual demographic variables" noisily: tab age_total_match if time==8 noisily: tab sex_total_match if time==8 noisily: tab race_total_match if time==8 noisily: tab all_match if time==8 noisily: di "...not a question answer, but..." noisily: di "...note that age topcoding is accounted for in the validation code!" noisily: list cpsidp mish age age_match age_total_match in 1865/1872, nol /* For Question 9, see 04b_asec_to_basic_partI_validated.txt */ log close cd .. }