******************************************************* *This do file is a solution to the Outgoing Rotation Group *exercise from the 2018 IPUMS-CPS Summer Data Workshop * *Written by the IPUMS-CPS team *31 May 2018 ******************************************************* cap log close set more off log using logs/07_org.log, t replace **************** *Run the extract **************** cd data quietly do cps_00175.do numlabel , add **************** *1. How many observations are there per month/year combination (Fill out Table 1)? **************** tab month year **************** *2. How many volunteer supplement respondents are in the data? **************** tab vlstatus year **************** *4. Using ELIGORG, how many respondents are eligible for the ORG questions? Fill in Table 2. **************** tab eligorg tab month year if eligorg==1 **************** *6. How many people are members of a union in the 2007 Volunteer supplement? How many people are members of a union and volunteer? How many people are not members of a union and volunteer? Fill in Table 3 below. **************** tab vlstatus union if year==2007 **************** *7. What are the MISH values of people in the Volunteer supplement with information on their union membership? About how many could we expect to retain in the sample if we linked forward to ORG responses for people in MISH 1-3 and 5-7 in the Volunteer supplement? **************** tab mish if year==2007 & month==9 & union!=0 ******************************* *Linking code *To link this, we first create temp files for each month-year, link the "focal month" (September) sequentially to October, November, and December and then keep only the ORG variables. The key steps that are new to this exercise relative to the previous one are the renaming variables. *STEP 1: create temp files. For this we use to sets of loops since our extract contains 8 months of data from 2 different years. foreach x in 2007 2008 { foreach y in 9 10 11 12 { tempfile extract`x'_`y' preserve *We keep the entire Volunteer supplement month if (`y'==9 & `x'==2007) | (`y'==9 & `x'==2008) { keep if year==`x' & month==`y' } *But we only need the ORG responses from the other months if (`y'!=9 & `x'==2007) | (`y'!=9 & `x'==2008) { keep if year==`x' & month==`y' *Notice that we rename these variables so that they are retained when we merge rename * *_`x'_`y' rename cpsidp_`x'_`y' cpsidp keep if eligorg==1 } *These tabs are done to ensure we've done everything right. September should have the full sample and the other months should have only MIS 4 and 8. tab month year tab mish eligorg save "`extract`x'_`y''" restore } } clear *STEP 2: Link the files. Start with the 2007 Volunteer Supplement and then link each other month sequentially. use "`extract2007_9'" *We use the "ds" command to create a complete list of variables that we want to rename. This should exclude CPSIDP since we need that to be common across all the samples. ds cpsidp, not local final_vars = r(varlist) display "`final_vars'" *We create two sets of variables. One will be the original set from September and the "_final" will be the ones that have the complete set of ORG responses from future months. rename * *_orig rename cpsidp_orig cpsidp foreach var in `final_vars' { gen `var'_final = `var'_orig } *Linking foreach y in 9 10 11 12 { if `y'!=9 { merge 1:1 cpsidp using "`extract2007_`y''" drop if _merge==2 drop _merge } } *Here we update the "_final" variables to include the variable values from future months. Note that we're only updating those that have missing values in September. foreach var in `final_vars' { replace `var'_final = `var'_2007_10 if `var'_2007_10!=. replace `var'_final = `var'_2007_11 if `var'_2007_11!=. replace `var'_final = `var'_2007_12 if `var'_2007_12!=. label values `var'_final `var'_lbl } *Now we can keep only the updated variables. We keep the "_orig" for validation. keep cpsidp *_final *_orig *STEP 3: Validate! generate sex_valid=0 replace sex_valid=1 if sex_orig==sex_final generate race_valid=0 replace race_valid=1 if race_orig==race_final generate age_diff= age_final - age_orig *For adjacent months before year break generate age_valid=0 replace age_valid=1 if age_diff==1 | age_diff==0 replace age_valid=1 if age_diff==5 & age_orig==80 & age_final==85 ***************************** *Summarize the errors ***************************** generate summary=0 replace summary=1 if sex_valid==0 replace summary=2 if race_valid==0 replace summary=3 if age_valid==0 replace summary=4 if sex_valid==0 & race_valid==0 replace summary=5 if age_valid==0 & race_valid==0 replace summary=6 if age_valid==0 & sex_valid==0 replace summary=7 if age_valid==0 & sex_valid==0 & race_valid==0 label define stuff 0 "all good" 1 "sex wrong" 2 "race wrong" 3 "age wrong" 4 "sex and race" 5 "age and race" 6 "age and sex" 7 "all wrong" label values summary stuff ****************** *8. How many of these links validate based on Sex? Race? Age? All three characteristics? ****************** tab sex_valid tab race_valid tab age_valid tab summary *STEP 4: Keep only validated individuals keep if summary==0 ****************** *9. Using the validated dataset linking Volunteer supplement and ORG data, how many respondents are included in the Volunteer supplement? Fill in Table 3. *10. Using the validated dataset linking Volunteer supplement and ORG data, how many people are members of a union and volunteer? How many people are not members of a union and volunteer? Fill in Table 3. ****************** tab vlstatus_final union_final if union_final!=0 & vlstatus_final!=99 ****************** *11. How much larger is the validated dataset linking Volunteer supplement and ORG data from subsequent months than the sample of only those members whose ORG data are collected in the same month as the Volunteer supplement data? ****************** tab vlstatus_final union_orig if union_orig!=0 & vlstatus_final!=99 *STEP 5: Validating jobs generate dif_occ=0 replace dif_occ=1 if occ1990_orig!=occ1990_final generate dif_ind=0 replace dif_ind=1 if ind1990_orig!=ind1990_final ****************** *12. Compare OCC1990 and IND1990 in September to the appropriate linked ORG data in subsequent months. How many people match occupations between the two time periods? Industry? Both? ****************** tab dif_occ tab dif_ind tab dif_occ dif_ind *STEP 6: Keep only those who validate on jobs. keep if dif_occ==0 & dif_ind==0 save vol_union_2007.dta, replace ****************** *13. How many respondents in the Volunteer supplement have the same job when they respond to UNION? How many people are members of a union and volunteer? How many people are not members of a union and volunteer? Fill in the final columns of Table 3 from part 1 of this exercise. ****************** tab vlstatus_final union_final if union_final!=0 & vlstatus_final!=99 clear ******************************* *STEP 7: Create the ORG-enhanced dataset for 2008. This just repeats the steps above. use "`extract2008_9'" ****************************** *Create variables ****************************** ds cpsidp, not local final_vars = r(varlist) display "`final_vars'" rename * *_orig rename cpsidp_orig cpsidp foreach var in `final_vars' { gen `var'_final = `var'_orig } foreach y in 9 10 11 12 { if `y'!=9 { merge 1:1 cpsidp using "`extract2008_`y''" drop if _merge==2 drop _merge } } foreach var in `final_vars' { replace `var'_final = `var'_2008_10 if `var'_2008_10!=. replace `var'_final = `var'_2008_11 if `var'_2008_11!=. replace `var'_final = `var'_2008_12 if `var'_2008_12!=. label values `var'_final `var'_lbl } keep cpsidp *_final *_orig ****************************** *Validate ****************************** generate sex_valid=0 replace sex_valid=1 if sex_orig==sex_final generate race_valid=0 replace race_valid=1 if race_orig==race_final generate age_diff= age_final - age_orig *For adjacent months before year break generate age_valid=0 replace age_valid=1 if age_diff==1 | age_diff==0 replace age_valid=1 if age_diff==5 & age_orig==80 & age_final==85 ***************************** *Summarize the errors ***************************** generate summary=0 replace summary=1 if sex_valid==0 replace summary=2 if race_valid==0 replace summary=3 if age_valid==0 replace summary=4 if sex_valid==0 & race_valid==0 replace summary=5 if age_valid==0 & race_valid==0 replace summary=6 if age_valid==0 & sex_valid==0 replace summary=7 if age_valid==0 & sex_valid==0 & race_valid==0 label define stuff 0 "all good" 1 "sex wrong" 2 "race wrong" 3 "age wrong" 4 "sex and race" 5 "age and race" 6 "age and sex" 7 "all wrong" label values summary stuff tab sex_valid tab race_valid tab age_valid tab summary keep if summary==0 generate dif_occ=0 replace dif_occ=1 if occ1990_orig!=occ1990_final generate dif_ind=0 replace dif_ind=1 if ind1990_orig!=ind1990_final keep if dif_occ==0 & dif_ind==0 save vol_union_2008.dta, replace clear *STEP 8: Link across years use vol_union_2007.dta *We keep only the final variables and the original MISH. The MISH_final will be only 4/8. keep cpsidp mish_orig *_final ****************** *14. About how many of our currently-linked and validated 2007 records are eligible to link to our 2008 linked ORG/Volunteer supplement sample? ****************** tab mish_orig *To link, we create two temp files. rename * *_t1 rename cpsidp_t1 cpsidp tempfile forlinking_t1 save "`forlinking_t1'" clear use vol_union_2008.dta keep cpsidp mish_orig *_final tab mish_orig rename * *_t2 rename cpsidp_t2 cpsidp tempfile forlinking_t2 save "`forlinking_t2'" clear *Now we link them. use "`forlinking_t1'" merge 1:1 cpsidp using "`forlinking_t2'" keep if _merge==3 drop _merge ****************************** *Validate ****************************** generate sex_valid=0 replace sex_valid=1 if sex_final_t1==sex_final_t2 generate race_valid=0 replace race_valid=1 if race_final_t1==race_final_t1 generate age_diff= age_final_t2 - age_final_t1 *For adjacent months before year break generate age_valid=0 replace age_valid=1 if age_diff==1 | age_diff==0 replace age_valid=1 if age_diff==5 & sex_final_t1==80 & sex_final_t2==85 ***************************** *Summarize the errors ***************************** generate summary=0 replace summary=1 if sex_valid==0 replace summary=2 if race_valid==0 replace summary=3 if age_valid==0 replace summary=4 if sex_valid==0 & race_valid==0 replace summary=5 if age_valid==0 & race_valid==0 replace summary=6 if age_valid==0 & sex_valid==0 replace summary=7 if age_valid==0 & sex_valid==0 & race_valid==0 label define stuff 0 "all good" 1 "sex wrong" 2 "race wrong" 3 "age wrong" 4 "sex and race" 5 "age and race" 6 "age and sex" 7 "all wrong" label values summary stuff ***************************** *15. How many people link across these two years before validation? How many links validate based on Sex? Race? Age? All three characteristics? ***************************** tab sex_valid tab race_valid tab age_valid tab summary keep if summary==0 ***************************** *16. For those whose links are validated, examine transitions in volunteering and union status. How many people change volunteer status across these two years? How many people change union status across these two years? ***************************** tab vlstatus_final_t1 vlstatus_final_t2 if vlstatus_final_t1!=99 & vlstatus_final_t2!=99 tab union_final_t1 union_final_t2 if union_final_t1!=0 & union_final_t2!=0 *************************** *End of exercise *************************** log close cd ..