INSTRUCTIONS FOR REPLICATION Socio-Behavioral Factors Contributing to Recent Mortality Trends in the United States Demographic Research, Summer 2024 Authors: S. Preston, Y. Vierboom, & M. Myrskylä Replication files by Y. Vierboom Table of Contents : Part 1. Overview of the sub-folders in the “Replication_analysis” folder Part 2. How to download the data Part 3. What each do-file does *-------------------------------------------------------------------------------------------------* PART 1: Overview Welcome to the replication files! In very basic terms, you get the data, save it where specified, and run the master do-file. The master do-file runs all the component do-files to produce all the numbers, tables, and figures in the text. This analysis was run using Stata 17. If you don’t have Stata, you can open the “Logfiles” folder and see a history of the codes run by the analysis. If you should run into any questions, please send an email to yvierboom@gmail.com. Overview of the folders: Data/Original. This sub-folder is empty. This where you will store the dataset you download from IPUMS. Data/Formatted. This is where the do-files will save created datasets. Dofiles. This sub-folder contains the master do-file, along with its components. See the end of this document for a more in-depth look at what each do-file does. Logfiles. In case you don’t have Stata, I’ve stored all logfiles from the analyses in this folder. If you run the do-files, they will save and overwrite new versions of the logfiles. Results. This is empty now. This is where the do-files save their results. *** PART 2: Get the Data Below, I outline how to build the raw dataset needed to run the analyses. The data is publicly available through IPUMS (Blewett et al. 2023) and will be saved in the “Data” folder. 2A. Make an account to access the IPUMS Health Surveys (https://healthsurveys.ipums.org/) and select the NHIS. 2B. Select the following variables: year strata psu hhweight pernum nhispid hhx fmx px perweight sampweight fweight intervmo intervwyr astatflg cstatflg age sex racea hispeth educrec2 bmicalc usualpl hinotcove diabeticev alc5upyr alcamt alcstat1 alcdayswk cigsday1 smokestatus2 hearing aeffort ahopeless anervous arestless asad aworthless mortelig mortstat mortdodq mortdody mortucodld mortwt mortwtsa 2C. Select samples 1997-2019 2D. Submit extract 2E. Download data 2F. Extract and store data in Replication_files -> Data -> Original *** PART 3. What each do-file does ***********0_MASTER_DOFILE************ This is the only do-file you need to run. But if you’re looking for more information on what each component does, see the list below. **************1_raw_format************** *GOAL: To format the original data. *INPUT: Raw NHIS data downloaded from IPUMS (and saved in "Data -> Original" folder). *STEPS: 1. Get data 2. Format and clean variables 3. Tell Stata it's survey data 4. Tell Stata it's surival data 5. Expand into person year observations 6. Variable for analysis subpop 7. Save *OUTPUT: NHIS_PYL.dta in "Data -> Formatted" **************2_Table1: ************** *GOAL: To create Table 1, "Characteristics fo the sample, by period of interview." *INPUT: -Table1data.dta created by dofile 1_raw_format.do and saved in Data/Formatted -NHIS_PYL.dta created by dofile 1_raw_format.do and saved in Data/Formatted *STEPS: 1. Open data 2. Create new variables 3. Count N and put that in Table 1 4. Estimate mean age and put that in Table 1 5. Arrange other variables and their means 6. Estimate mean year of follow-up (using the PYL file) 7. Put steps 5-6 in Table 1 *OUTPUT: 1. Table1data in Data/Formatted (later used to create Table 3) 2. Table1 in Results folder ************3_Table2************ *GOAL: Produce Table 2: "Coefficients from Cox proportional hazard models predicting deaths per person per year." *INPUT: -NHIS_PYL in Data/Formatted (made by dofile 1) *STEPS: 1. Open data 2. Display person years 3. Run models 4. Print models in Table 2 5. Estimate model halves (later needed for Table 4) *OUTPUT: [Requires running four different models (Three in table and one cited in footnote)] -saves each model as .dta in Data/Formatted -saves first and second half of M3 at .dta in Data/Formatted -Table 2 in Results ************4_Table3-4_Figure1: ************* *GOAL: Make Tables 3 and 4. Make Figure 1. *INPUT: The idea of Tables 3 and 4 is that it's: distributions * coefficents (and Figure 1 is just a graphical representation of that). -Table1.dta -- created by 2_Table1.do -m1.dta -- all models creaded by 3_Table2.do -m2.dta -m3.dta -m3_1.dta (first half of period modeled with m3) -m3_2.dta (second half of period modeled with m3) *OUTPUT: -Results/Table3 -Results/Table4 -Results/Figure1 -Figure.dta used in sensitivity analysis in *************5_AppFigure1************** *STEPS: 1. Adjust distributions dataset (from Table 1) 2. Adjust coefficients dataset 3. Merge 1 & 2 4. Do some calculations (ie, one times the other, divided by year) 5. Export for Tables 3 and 4 6. Save there for later sensitivity analysis 7. Create Figure 1 5_AppFigure1: *GOAL: Make Appendix Figure 1 (test of 5 vs 3 * years of follow-up. *INPUT: -Table1data.dta (created by 2_Table1.do) -Table1.dta (created by 2_Table1.do) -Figure.dta (created by 4_Table3-4_Figure1.do *OUTPUT: -AppFigure1 *STEPS: Basically, we need to repeat the entire analysis, using data that's been censored at 3 years, instead of 5. So it's a lot of steps. 1. Open and format the original data 2. Tell Stata it's survival data 3. Expand into person years 4. Subpop variable 5. Calculate sum of mean observation 6. Merge this data with other variable distributions 7. Regression 8. Adjust regression data 9. Merge distribution and coefficient data 10. Estimate contributions 11. Merge with original contributions 12. Graph differences in contributions from 3 vs 5 year censoring // End