Regions are more specialised and open than countries
Important external trade dependences (Thissen et al. 2016)
Regions vary in terms of their specialisation patterns and, therefore, in their trade relationships and openness
Regional trade flow
Knowing and predicting regional trade then helps to understand:
regional economic performance
exposure to external shocks
place-based development
Employment vulnerability and transmission of internal and external shocks is different for different regions.
Workers in regions in the US with a specialisation in specific manufacturing industries were more vulnerable for the emergence of China (Autor et al. 2013)
Regional trade flow: hardly any data
Big caveat: interregional trade data
Europe: spatially disaggregated IO for NUTS2 regions (Thissen et al., 2018)
Coslty, difficult exercise
Our contribution
Utilise the digital traces that interregional trade leave behind
Model and predict interregional trade flows for the UK
Scrape open web data
Hyperlinks between commercial websites
Machine learning techniques for out-of-sample predictions
Hypothesis: such hyperlinks reflect business and trade relations
Web data and spatial research
Web data and businesses
Businesses may not expose all of their strategies on their websites, but neither do they do during surveys (Arora et al. 2013)
Business websites:
spreading information
establishing a public image
supporting online transactions
sharing opinions
Spatial studies using hyperlinks
Hyperlinks tend to follow national borders and gravitate towards the US (Halavais 2000)
Keßler (2017) used the hyperlinks between German Wikipedia webpages to represent the hierarchy of urban centres in Germany
Salvini and Fabrikant (2016) used a the English Wikipedia to build a graph of world cities
Hyperlinks between and to administrative websites to study spatial relationships and structure (Holmberg and Thelwall 2009; Holmberg 2010; Janc 2015)
Spatial studies using hyperlinks
Lin, Halavais, and Zhang (2007) used webblog hyperlinks to analyse the spatial reflections of the blogsphere
Jones, Spigel, and Malecki (2010) focused on the New York City theatre scene to investigate the existence and role of a ‘virtual buzz’
Business studies using hyperlinks
Hyperlinks to business websites reflect business motivations and contain useful business information (Vaughan, Gao, and Kipp 2006)
Significant correlations between the number of incoming links and business performance (Vaughan 2004; Vaughan and Wu 2004)
Krüger et al. (2020) used hyperlinks between business websites in Germany to test the role of different proximity frameworks
Ιnnovative businesses share more hyperlinks with other business, which also tend to be innovative
Data science: new wine in old bottles?
Spatial interaction predictions
Plenty of ML applications predicting out-of-sample flows:
Robinson and Dilkina (2018) used XGBoost and Artificial Neural Network models to predict global migration
Tribby et al. (2017) used RF to select variables associated with walking route choice models
Guns and Rousseau (2014) use RF to predict and recommend high-potential research collaborations, which have not yet been materialised
Spatial interaction predictions
Current economic thinking advocates towards the use of ML algorithm such as Random Forest
They tend to outperform ordinary least squares in out-of-sample predictions even when using moderate size training datasets and limited number of predictors (Mullainathan and Spiess 2017; Athey and Imbens 2019).
Empirical strategy
Web data: The Internet Archive
The largest archive of webpages in the world
273 billion webpages from over 361 million websites, 15 petabytes of storage (1996 -)
A web crawler starts with a list of URLs (a seed list) to crawl and downloads a copy of their content
Using the hyperlinks included in the crawled URLs, new URLs are identified and crawled (snowball sampling)