Material and immaterial regional interdependencies: using the web to predict regional trade flows

Emmanouil Tranos, Andre Carrascal Incera & George Willis

University of Bristol, Alan Turing Institute
, @EmmanouilTranos,


  • Introduction
  • Empirical strategy
  • Descriptive statistics
  • Results
  • Conclusions


Regional trade flow

  • Regions are more specialised and open than countries
  • Important external trade dependences (Thissen et al. 2016)
  • Regions vary in terms of their specialisation patterns and, therefore, in their trade relationships and openness

Regional trade flow

  • Knowing and predicting regional trade then helps to understand:
    • regional economic performance
    • exposure to external shocks
    • place-based development
  • Employment vulnerability and transmission of internal and external shocks is different for different regions.
  • Workers in regions in the US with a specialisation in specific manufacturing industries were more vulnerable for the emergence of China (Autor et al. 2013)

Regional trade flow: hardly any data

  • Big caveat: interregional trade data
  • Europe: spatially disaggregated IO for NUTS2 regions (Thissen et al., 2018)
  • Coslty, difficult exercise

Our contribution

  • Utilise the digital traces that interregional trade leave behind
  • Model and predict interregional trade flows for the UK
  • Scrape open web data
  • Hyperlinks between commercial websites
  • Machine learning techniques for out-of-sample predictions
  • Hypothesis: such hyperlinks reflect business and trade relations

Web data and spatial research

Web data and businesses

  • Businesses may not expose all of their strategies on their websites, but neither do they do during surveys (Arora et al. 2013)
  • Business websites:
    • spreading information
    • establishing a public image
    • supporting online transactions
    • sharing opinions

Data science: new wine in old bottles?

Spatial interaction predictions

  • Plenty of ML applications predicting out-of-sample flows:
    • Robinson and Dilkina (2018) used XGBoost and Artificial Neural Network models to predict global migration
    • Tribby et al. (2017) used RF to select variables associated with walking route choice models
    • Guns and Rousseau (2014) use RF to predict and recommend high-potential research collaborations, which have not yet been materialised

Spatial interaction predictions

  • Current economic thinking advocates towards the use of ML algorithm such as Random Forest
  • They tend to outperform ordinary least squares in out-of-sample predictions even when using moderate size training datasets and limited number of predictors (Mullainathan and Spiess 2017; Athey and Imbens 2019).

Empirical strategy

Web data: The Internet Archive

  • The largest archive of webpages in the world
  • 273 billion webpages from over 361 million websites, 15 petabytes of storage (1996 -)
  • A web crawler starts with a list of URLs (a seed list) to crawl and downloads a copy of their content
  • Using the hyperlinks included in the crawled URLs, new URLs are identified and crawled (snowball sampling)
  • Time-stamp

Web data: The Internet Archive