Methodology
The PREMIUM_EU Dashboard integrates data from multiple sources and employs advanced analytical methods to provide comprehensive insights into demographic trends, migration patterns, and regional development across Europe.
Our approach combines official statistics, novel data sources, and sophisticated modelling techniques to ensure accuracy, comparability, and usability for policy makers and researchers.
Regional Development Indicators
Regional development data were collected from authoritative sources including Eurostat and the OECD. Indicators are organized into three categories: economic (including GDP per capita), social (such as hospital beds per 1,000 people), and living environment (including air pollution). Each region receives a score from 0 to 1 for each indicator, with the highest-performing region scoring 1 and the lowest scoring 0.
Based on overall development scores calculated across all indicators (2010–2021), regions are classified into four categories: Vulnerable (lowest 25%), Underdeveloped (25–50%), Leading (50–75%), and Advanced (top 25%). Regions are further categorized as improving, declining, or stagnant based on statistical trend analysis over the data period.
Population Estimates and Projections
Population data are constructed by integrating multiple data sources to capture the distribution of populations by age, gender, education level, and region. We combined population estimates for 2010, 2015, and 2020 from Eurostat (by NUTS3 region, age, and gender) with national population estimates from the Wittgenstein Centre Human Capital Data Explorer (WIC) (by age, gender, and education level). Since Eurostat data do not include regional variation in education levels, we developed a predictive model that estimates regional shares of low, middle, and high education based on economic, social, and environmental indicators from our Regional Development Module.
Population projections follow the same integration approach, combining Eurostat NUTS3 projections (by age and gender) with WIC national projections (by age, gender, and education), supplemented by model predictions of regional education shares. Projections span 2020–2024 through 2035–2039 in five-year increments, capturing key population characteristics including place of birth (same country, other EU countries, or outside the EU) and education level. The projection model applies survival rates to estimate mortality, educational transition rates to model progression to higher education levels, migration flows based on long-term averages (1990–2020), and fertility rates to calculate births.
Net Migration Estimates
Net migration is estimated using the population balancing equation, which accounts for changes in both internal and international migration flows. For historical periods (2010–2015 and 2015–2020), net migration is calculated as:
N = P₂ − P₁ − B + D, where P₁ and P₂ are populations at the beginning and end of the period, B represents births, and D represents deaths, with adjustments for aging and educational transitions.
Migration projections (2020–2025, 2025–2030, 2030–2035, 2035–2040) use the same balancing equation combined with Eurostat projections of births and deaths at NUTS3 level (by age and gender) and WIC national projections under the SSP2 "middle of the road" scenario. These projections incorporate demographic, educational, and geographic dimensions.
Migration Flows
Migration flow data are derived from the Human Migration Database (HMigD), developed at the Max Planck Institute for Demographic Research and partially supported by PREMIUM_EU. The HMigD combines bilateral flow data from Eurostat, the UN, and National Statistical Institutes with information from the EU Labour Force Survey and online-based migration estimates. Since these sources differ in definitions, coverage, and data quality, they are integrated using a hierarchical Bayesian model that corrects for undercounting, duration-of-stay rules, and other measurement issues. The model predicts missing corridors and captures sudden changes in migration patterns.
Age and education proportions of migration flows are based on census samples from the IPUMS-International Database. Missing years and countries are estimated using Random Forest models that consider population size and structure by education levels, existing migrant populations, and socio-economic variables such as GDP per capita, human development index, fertility rates, and life expectancy. Final migration flows are harmonized to represent long-term migration (minimum 12-month stay) between European countries, ensuring cross-country comparability.
Policy Data Collection
Policy data were collected using both automated and manual methods to ensure comprehensive coverage of relevant policy initiatives across Europe. Following Knoepfel (2011), we define policies as "a connection of intentionally consistent decisions and activities taken from different public actors, and sometimes private ones, in order to solve in a targeted way a problem which, politically, is defined as collective."
Policy documents must contain: (1) a time frame or time period, (2) a geographical area, and (3) a specific goal.
We developed a Python-based web scraper that automatically downloads PDF documents from specified websites. A trained machine learning model then identifies policy documents among downloaded files. We complemented this automated approach with manual collection of policies following the same definitional criteria, ensuring comprehensive coverage of regional and national policy initiatives. Additionally, we conducted Policy Labs that brought together policymakers involved in regional development and mobility to discuss their experiences, expectations, and ideas in moderated sessions. Insights gathered during these sessions informed the development of policy recommendations on the Dashboard.
Integration Index
The Integration Index provides a comprehensive measure of how well first- and second-generation migrants are integrated across European regions. The index is based on responses to selected questionnaires from the European Social Survey (ESS), collected at different time points depending on when specific modules were fielded in each country.
To capture integration as a multidimensional concept, the index combines five equally weighted components (following Jonsson et al., 2018; Heath & Schneider, 2021):
Each component is scaled between 0 and 1 to ensure comparability. The Integration Index for each respondent is calculated as the simple average of these five dimensions, resulting in scores ranging from 0 (lowest integration) to 1 (highest integration).