Volume 37 - Article 46 | Pages 1477–1514
Using Twitter data for demographic research
By Dilek Yildiz, Jo Munson, Agnese Vitali, Ramine Tinati, Jennifer A. Holland
Abstract
Background: Social media data is a promising source of social science data. However, deriving the demographic characteristics of users and dealing with the nonrandom, nonrepresentative populations from which they are drawn represent challenges for social scientists.
Objective: Given the growing use of social media data in social science research, this paper asks two questions: 1) To what extent are findings obtained with social media data generalizable to broader populations, and 2) what is the best practice for estimating demographic information from Twitter data?
Methods: Our analyses use information gathered from 979,992 geo-located Tweets sent by 22,356 unique users in South East England between 23 June and 4 July 2014. We estimate demographic characteristics of the Twitter users with the crowd-sourcing platform CrowdFlower and the image-recognition software Face++. To evaluate bias in the data, we run a series of log-linear models with offsets and calibrate the nonrepresentative sample of Twitter users with mid-year population estimates for South East England.
Results: CrowdFlower proves to be more accurate than Face++ for the measurement of age, whereas both tools are highly reliable for measuring the sex of Twitter users. The calibration exercise allows bias correction in the age-, sex-, and location-specific population counts obtained from the Twitter population by augmenting Twitter data with mid-year population estimates.
Contribution: The paper proposes best practices for estimating Twitter users’ basic demographic characteristics and a calibration method to address the selection bias in the Twitter population, allowing researchers to generalize findings based on Twitter to the general population.
Author's Affiliation
- Dilek Yildiz - International Institute for Applied Systems Analysis (IIASA), Austria EMAIL
- Jo Munson - University of Southampton, United Kingdom EMAIL
- Agnese Vitali - Università degli Studi di Trento, Italy EMAIL
- Ramine Tinati - University of Southampton, United Kingdom EMAIL
- Jennifer A. Holland - Erasmus Universiteit Rotterdam, the Netherlands EMAIL
Other articles by the same author/authors in Demographic Research
The demography of sexual identity development and disclosure among LGB people in Europe
Volume 52 - Article 5
The transition to adulthood in Europe at the intersection of gender and parental socioeconomic status
Volume 51 - Article 23
A multidimensional global migration model for use in cohort-component population projections
Volume 51 - Article 11
A Bayesian model for the reconstruction of education- and age-specific fertility rates: An application to African and Latin American countries
Volume 49 - Article 31
Retraditionalisation? Work patterns of families with children during the pandemic in Italy
Volume 45 - Article 31
Unpacking intentions to leave the parental home in Europe using the Generations and Gender Survey
Volume 45 - Article 2
The timing of marriage vis-à-vis coresidence and childbearing in Europe and the United States
Volume 36 - Article 20
Who brings home the bacon? The influence of context on partners' contributions to the household income
Volume 35 - Article 41
Youth prospects in a time of economic recession
Volume 29 - Article 36
Love, marriage, then the baby carriage? Marriage timing and childbearing in Sweden
Volume 29 - Article 11
Most recent similar articles in Demographic Research
Which definition of migration better fits Facebook ‘expats’? A response using Mexican census data
Volume 50 - Article 39
| Keywords:
census data,
Facebook,
international migration,
Mexico,
social media
Traditional versus Facebook-based surveys: Evaluation of biases in self-reported demographic and psychometric information
Volume 42 - Article 5
| Keywords:
bias,
demography,
Facebook,
moral foundations,
personality,
psychometrics,
recruitment bias,
self-reporting bias,
self-selection bias,
social media,
survey
Happy parents’ tweets: An exploration of Italian Twitter data using sentiment analysis
Volume 40 - Article 25
| Keywords:
parenthood,
social network,
subjective well-being,
Twitter
WhatsApp usage patterns and prediction of demographic characteristics without access to message content
Volume 39 - Article 22
| Keywords:
demographics,
social media,
social network,
usage prediction,
WhatsApp
Estimates of mortality and population changes in England and Wales over the two World Wars
Volume 13 - Article 16
| Keywords:
England,
population estimates,
Wales,
World War I,
World War II
Cited References: 33
Download to Citation Manager
PubMed
Google Scholar