******************************************************* *This do file is a solution to the Validating Links *exercise from the 2018 IPUMS-CPS Summer Data Workshop * *Written by the IPUMS-CPS team *31 May 2018 ******************************************************* qui{ cap log close set more off clear /************************************************************************************************************ Re-tooled to work with pre-written validation code. This is just another example of how to do the same thing. *************************************************************************************************************/ log using logs/03_asec_to_basic_partI.log, t replace local mar_exnum 00205 //pull in ipums extract (2015 asec and 2014 December bms; include CPSIDP, ASECOVERP, MISH, FSSTATUS, CTCRD) cd data qui do cps_`mar_exnum'.do noisily: di "...now we have IPUMS data ready to go" noisily: list cpsidp mish age sex in 1/10 /******************************************************** long file - each person has two observations in the file. ********************************************************/ noisily: di "First, we will work with a long file where each person has two observations in the data..." noisily: di "...sort the data by cpsidp, year, and month to get records in chronological order (December on top of ASEC)" sort cpsidp year month /*Question 2*/ noisily: di "...check for duplicate values of cpsidp" noisily: duplicates report cpsidp /*Question 3*/ noisily: di "...yikes. Ok. Let's just look at the values that appear more than twice" noisily: duplicates tag cpsidp, generate(dup_cpsid) noisily: tab cpsidp if dup_cpsid>2 noisily: tab dup_cpsid asecoverp noisily: di "...notice that there are records that have a cpsidp value of 0 from the ASEC" noisily: list cpsidp year month mish asecflag in 1/10 noisily: di "...and that these records are from the ASEC oversample." *list cpsidp year month mish asecoverp in 1/10 noisily: tab asecoverp if cpsidp == 0 noisily: di "...ASEC oversample records cannot be linked to Basic Monthly files. Drop them." /*ASEC oversample records are drawn from non-march months (see MARBASECID paper)*/ keep if asecoverp != 1 noisily: list cpsidp year month mish asecoverp in 1/10 //perserve for later use noisily: di "(...preserve data in its current state for later use...)" preserve noisily: di "...now create a variable for the number of times an individual appears...you know the drill!" bysort cpsidp (mish) : gen time = _n egen count=max(time),by(cpsidp) noisily: list cpsidp year month mish age sex race time count in 1/10 /*Question 4*/ noisily: di "...you can see that, after dropping the oversample records..." noisily: di "...the ASEC merges to the December BMS just as the March BMS would!" noisily: tab mish count if month==12 noisily: tab mish count if month==3 /*Question 5*/ noisily: di "...count the successful links ASEC and FSS" noisily: count if month == 12 & count == 2 noisily: di "...count the December records eligible to link to the ASEC" noisily: count if month==12 & (mish==1 | mish==5) /*eligible unlinked december records*/ noisily: di "...count the number of eligible records that didn't succesfully link" noisily: count if month == 12 & count == 1 & (mish == 1 | mish == 5) /*end Question 5*/ /*to restrict the sample of just linked records, drop those cpsids that only appear once*/ drop if count==1 /*********************************************************** wide file - each observed time point has its own set of vars ************************************************************/ noisily: di "Another tactic, perhaps one that is more useful for attaching data avilable in one file to records appearing in another file," noisily: di "is a wide merge - in this case, each linked record appears once in the file, and this one record's different time observations" noisily: di "are all seperate variables.\n" noisily: di "To do this we will generate a new set of variables for the december records,roll them up onto the asec records." noisily: di "This could be done for multiple subsequent months." noisily: di "To do this, let's return to the point where we have just dropped the oversample records." restore noisily: di "...first make lists of variables that we need to rename for a wide file to work" noisily: di "...these should be both variables common across files and variables that appear only in the ASEC" local vars year month mish age sex race noisily: di "...and the variables that appear only in december" local dec_vars fsstatus fshwtscale noisily: di "...sort by cpsidp and monthand generate a the index and counter variables to identify the number of times a record appears in the long file" bysort cpsidp (year month) : gen time = _n egen count=max(time),by(cpsidp) noisily: tab mish count noisily: di "...drop records that do not link from the long file" drop if count==1 noisily: di "...generate new variables for those that are common across months based on time" foreach var in `vars'{ noisily: di "..." forvalues i = 1/2{ qui gen `var'_`i' = `var' if time==`i' } } noisily: di "...now common variables have been renamed" noisily: list cpsidp month_1 month_2 race_1 race_2 ctccrd fsstatus time in 1/10 noisily: di "...notice we now have a set of variables with 1 suffix that have values for December and are missing for the asec." noisily: di "We will now 'roll-up' those meaningful values on the December records to replace the missings for on the ASEC rcords." local vars_to_roll_up `vars' foreach var in `vars_to_roll_up'{ //di "`var'" qui replace `var'_1 = `var'_1[_n-1] if `var'_1==. drop `var' } noisily: di"...do the same thing for variables that appear only in noisily: di "...now we can see that the values for the first time point have been moved down to the second time point" noisily: di "Values for the _1 variables are now on the March records along with the spmcaphous variable." noisily: list cpsidp month_1 month_2 race_1 race_2 ctccrd fsstatus time in 1/10 noisily: di "...now we have to roll up vars that appear in December only onto March record. No need to rename these." foreach var in `dec_vars'{ noisily: di "`var' values moving to ASEC record" qui replace `var' = `var'[_n-1] if `var'==. } noisily: list cpsidp month_1 month_2 race_1 race_2 ctccrd fsstatus time in 1/10 noisily: di "...now that we have everything that we want on the first of the two records..." noisily: di "...we can drop those records which originally came from the December sample, as all of their information is now on the ASEC record." drop if time == 1 noisily: list cpsidp month_1 month_2 race_1 race_2 ctccrd fsstatus time in 1/10 /*Question 6*/ noisily: di "...and we SHOULD have the same number of successful links!" noisily: count /*Question 9*/ //validate qui do ../validate_wide.txt 2 noisily: tab age_match noisily: tab sex_match noisily: tab race_match noisily: tab all_match log close cd .. }