**The demography of sexual identity development and disclosure among LGB people in Europe** Authors: Anna Caprinali and Agnese Vitali, University of Trento (Italy) The .zip folder contains: - the do-file used to conduct the analyses (both in .do and in .txt, named "LGBSexualIdentityDevelopment_ReplicationFile_STATA") - the R script used to create Figure 1 (Kaplan Meier curves) and Figure 2 (Maps of Europe), both in .R and in .txt, named "LGBSexualIdentityDevelopment_ReplicationFile_R") Data has been analysed with STATA 18 (Stata/SE 18.0) and R studio 2023.09.1+494 Previous versions of the softwares can be used as well. For any inquiries, please feel free to contact the corresponding author: anna.caprinali@unitn.it *-------------------------------------------------------------------------------------------------------------------------------* DATA AVAILABILITY This paper uses individual-level microdata collected by the European Union Agency for Fundamental Rights (FRA). The EU LGBTI II Survey is a cross-sectional follow-up to a survey conducted by the FRA in 2012. Data from the EU LGBTI II Survey are stored in the GESIS data archive: https://doi.org/10.4232/1.13733. The data are available upon request, subject to a fee and the signing of a contract with GESIS. Data and documents are only released for academic research and teaching after the data depositor’s written authorization. For this reason, neither the data or a sample dataset can be provided for illustrative purposes. Further information can be found at: https://search.gesis.org/research_data/ZA7604 Technical report and the questionnaire are available at GESIS website, also without obtaining the data. Data can be downloaded in spss or stata format. The dataset contains 139,799 unique observations. - After restricting the sample just to L/G/B respondents the sample equals 118,543 respondents divided as follows: - Gay men: 58908 (49.69 %) - Lesbian women: 22707 (19.16 %) - Bisexual men: 9711 ( 8.19 %) - Bisexual women: 27217 (22.96 %) - In the Kaplan Meier analyses we further restricted the sample to individuals who self-disclose their sexual identity before 25 and whose age of first coming out is subsequent the on of self-disclosure. Accordingly, we reached a sample size equals to: - 113,181 subjects for the outcome age at self-disclosure; - 112,784 subjects for the outcome age at first coming out. The variables used in the analysis are listed below. The timing of events of the transition to adulthood can be retrieved from: - RESPONDENT_CATEGORY (categorization of respondents as either lesbian, gay, intersex, bisexual (female), bisexual (male) or trans made by FRA) - A1 (age, recoded in categories) - A3 (gender identity) - A4 (sexual identity) - A13 (age when respondent realized they were L/G/B) - A14 (age when respondent did their first coming out as L/G/B) - A10 (country where you live in) - BENAFFWT (Benchmark X Propensity Combination Weight - standardised and trimmed) - BENAFFWTEU30 (EU-30 country weight X Benchmark X Propensity Combination Weight - 2019) *-------------------------------------------------------------------------------------------------------------------------------* STEPS FOR REPLICATION Obtain the data from GESIS following the procedure previously descripted. -- STATA -- The do-file allows you to: A. FIRST PART: EXPLORE and PREPARE the data - explore the distribution of the variables related to gender and sexual orientation; - recode the variables used in the analyses, in particular the three outcomes: age at self-disclosure, at first coming out and gap between the two of them. B. SECOND PART: ANALYSE the data We provide both bivariate and trivariate analyses (whenever possible). The replication code for STATA allows you to recreate: - The estimates for Table 1 (Mean age of self-disclosure, first coming out and gap between the two by sexual identity) - The t-test for differences by sexual LGB identity in Table 1 - namely (1) Gay men vs LB respondents; (2) Lesbian/Gay vs Bisexual respondents; (3) LGB men vs LGB women. - Data preparation for survival analyses and the export of data in Excel format to run the code for Kaplan Meier in R studio. - The estimates for Table 2 (Mean age of self-disclosure, first coming out and gap between the two by country) - The t-test for differences by country vs the aggregate mean in Europe. - Data preparation for descriptive maps of Europe and the export of data in Excel format to run the code for maps in R studio. - Data preparation for Figure 3 (distribution of age gap by sexual identity and cohort) and export of data in Excel format to run the code in R Studio. -- R studio -- A. FIRST PART: Kaplan Meier curves The R script allows you to: - plot the Kaplan Meier curves by birth cohorts for the two outcomes ("Age at self-disclosure" and "Age at first coming out") with weighted data, confidence intervals at 95% and sample restricted to people who disclosed their sexual identity before 25; - combine the two plots together. B. SECOND PART: Maps of age gap by country and sexual identity - Create the maps of mean age gap (i.e., the time between age at self-disclosure and age at first coming out) by European countries and sexual identity with weighted data. C. THIRD PART: - Create the boxplot for age gap between self-disclosure and first coming out by sexual identity and cohort. ------------- end of the read_me file ---------------------------