Using the web to predict regional trade flows: material and immaterial regional interdependencies

Emmanouil Tranos, Andre Carrascal Incera & George Willis

University of Bristol, Alan Turing Institute
, @EmmanouilTranos,


  • Introduction
  • Web data and spatial research
  • Empirical strategy
  • Descriptive statistics
  • Results
  • Conclusions


Regional trade flows

  • Bilateral trade is a complex phenomenon (Serrano and Boguñá 2003)
  • Its complexity increases when it is approached from a spatially disaggregated perspective
  • Regions are more specialised and open than countries
  • Regions are more open to trade with other regions in comparison to national economies
  • Important external trade dependencies
  • Regions vary a lot in terms of their specialisation patterns, trade relationships and openness

Regional trade flows

  • Knowing and predicting regional trade helps to understand:
    • regional economic performance
    • exposure to external shocks
    • place-based development
  • Employment vulnerability and transmission of internal and external shocks is different for different regions.

Regional trade flow: hardly any data

  • Big caveat: interregional trade data
  • Europe: spatially disaggregated IO for NUTS2 regions (Thissen, Diodato, and Van Oort 2013b, 2013a)
  • Costly, difficult exercise

Our contribution

  • Utilise the digital traces that interregional trade leaves behind
  • Model and predict trade flows for the UK NUTS2 regions
  • Scrape open web data
  • Hyperlinks between commercial websites
  • ML techniques for predictions of unseen interregional trade flows
  • Spatially disaggregated trade data
  • Hypothesis: such hyperlinks reflect business and trade relations

Web data and spatial research

Web data and business studies

  • Businesses may not expose all of their strategies on their websites, but neither do they do during surveys (Arora et al. 2013)
  • Business websites:
    • spreading information
    • establishing a public image
    • supporting online transactions
    • sharing opinions

Empirical strategy

Web data: The Internet Archive

  • The largest archive of webpages in the world
  • 273 billion webpages from over 361 million websites, 15 petabytes of storage (1996 -)
  • A web crawler starts with a list of URLs (a seed list) to crawl and downloads a copy of their content
  • Using the hyperlinks included in the crawled URLs, new URLs are identified and crawled (snowball sampling)
  • Time-stamp

Web data: The Internet Archive