A multi-scale story of the diffusion of a new technology: the web


Emmanouil Tranos

University of Bristol, Alan Turing Institute
, @EmmanouilTranos, etranos.info

Contents

  • Introduction

  • Web data

  • Methods

  • Results

    • S-shaped diffusion curves
    • Rank dynamics
    • Exploratory spatial analysis
    • Modelling
  • Conclusions

Introduction

Aim

  • Diffusion of a new technology: the web
  • Geographers used to be interested in diffusion
  • Hägerstrand et al. (1968)
  • Passed the torch to economists and sociologists
  • Why? Lack of granular data:

Because new digital activities are rarely—if ever—captured in official state data, researchers must rely on information gathered from alternative sources (Zook and McCanless 2022).

Importance

  • Guide policies for deployment of new technologies

  • Predictions of introduction times for future technologies (Meade and Islam 2021):

    • Network operators

    • Suppliers of network equipment

    • Regulatory authorities

Technological diffusion


Spatial diffusion processes

  • As in temporal diffusion models, an S-shaped pattern in the cumulative level of adoption

  • A hierarchy effect: from main centres to secondary ones – central places

  • A neighborhood effect: diffusion proceeds outwards from innovation centres, first “hitting” nearby rather than far-away locations (Grubler 1990)

Hägerstrand (1965): from innovative centres (core) through a hierarchy of sub-centres, to the periphery

Diffusion of a new digital technology


  • Diffusion of an intangible, digital technology [web]

  • Map the active engagement with the digital

  • Over time, early stages of the internet [1996-2012]

  • Granular and multi-scale spatial processes

Web data

Long story short

  • Data from the Internet Archive, the oldest web archive

  • Observe commercial websites 1996 - 2012 in the UK (.co.uk)

  • Geolocation: postcode references in the text

  • Timestamp: archival year

  • Counts

Web data: The Internet Archive

Web data: The Internet Archive

Long story short

  • Data from a Web Archive – The Internet Archive

  • Observe commercial websites 1996 - 2012 in the UK (.co.uk)

  • Geolocation: postcode references in the text

  • Timestamp: archival year

  • Counts

Web data: The Internet Archive

  • The largest archive of webpages in the world
  • 273 billion webpages from over 361 million websites, 15 petabytes of storage (1996 -)
  • A web crawler starts with a list of URLs (a seed list) to crawl and downloads a copy of their content
  • Using the hyperlinks included in the crawled URLs, new URLs are identified and crawled (snowball sampling)
  • Time-stamp

Our web data

  • JISC UK Web Domain Dataset: all archived webpages from the .uk domain 1996-2012

  • Curated by the British Library

Our web data

  • All .uk archived webpages which contain a UK postcode in the web text

  • Circa 0.5 billion URLs with valid UK postcodes



20080509162138 | http://www.website1.co.uk/contact_us | IG8 8HD

Data cleaning

Unique postcodes frequencies, 2000

level freq perc cumfreq cumperc
(0,1] 41,596 0.718 41,596 0.718
(1,2] 6,451 0.111 48,047 0.830
(2,10] 6,163 0.106 54,210 0.936
(10,100] 2,975 0.051 57,185 0.988
(100,1000] 646 0.011 57,831 0.999
(1000,10000] 62 0.001 57,893 1.000
(10000,100000] 4 0.000 57,897 1.000


  • Websites with a large number of postcodes: e.g. directories, real estate websites

  • Focus on websites with one unique postcode per year

Directory website with a lot of postcodes

Website with a unique postcode in London

Methods

Reminder: diffusion mechanisms

  • S-shaped pattern in the cumulative level of adoption

  • A hierarchy effect: from main centres to secondary ones

  • A neighborhood effect: first “hitting” nearby locations

Methods

  • Cumulative adoption: Self-starting logistic growth model [nls and SSlogis]

  • Descriptive statistics & ESDA

  • Machine learning framework [random forests]

  • Two scales:

    • websites per firm in a Local Authority
    • websites in an Output Area

S-shaped diffusion curves

Diffusion speed


  • Spatial heterogeneity

  • Not a clear, easy to explain pattern

Rank dynamics: stability vs. volatility


  • Adoption heterogeneity

  • Different perceptions of risk and economic returns from new technologies

  • Early adopters vs. laggards, leapfrogging

Rank dynamics and diffusion speed

  • Spatial heterogeneity

  • Expected volatility

Spatial mechanisms

Neighbourhood effect: diffusion proceeds outwards from innovation centers, first “hitting” nearby rather than far-away locations (Grubler 1990)

  • Spatial dependency (Moran’s I & LISA maps)
  • Websites per firm in Local authorities (c. 400)

  • Websites in Output Areas (c. 230,000)

Neighbourhood effect

  • Spatial dependency
    • Relatively small, consistent over time / scales
    • London hot spot early on
    • At local scale, consistent hotspots over time
    • Granular analysis reveals other hotspots


Hierarchy effect: from main centers to secondary ones – central places

  • Gini coefficient

Hieararchy

  • Almost perfect polarisation of web adoption in the early stages at a granular level

  • Polarisation decreases over time

  • More equally diffused at the Local Authority level

Putting all of these together

Modelling framework

  • A hierarchy effect: from main centres to secondary ones

  • A neighborhood effect: first “hitting” nearby locations

  • S-shaped pattern in the cumulative level of adoption


\[Website\,Density_{t} \sim \color{orange}{Distance\,London} + \color{orange}{Website\,density\,London_{t-1}} + \\ \color{orange}{Distance\,Nearest\,City} + \color{orange}{Website\,density\,Nearest\,City_{t-1}} + \\ \color{orange}{Distance\,Nearest\,Retail_{i}} + \color{orange}{Website\,density\,Nearest\,Retail_{t-1}} +\\ \color{olive}{W*\, Website\,density_{t-1}} +\\ \color{violet}{year_{t}}\]

Modelling framework

  • Random forests to predict \(Website\,Density_{i,t}\)

  • 2 spatial resolutions:

    • Local Authorities (websites per firm)
    • Output Areas (websites)
  • 2 sets of models:

    • All the data ⟹ variable importance
    • Train on all but one region, test on the holdout region ⟹ spatial differences and similarities of diffusion mechanisms
  • Space-time sensitive 10-fold CV (CAST)

Models trained on all data


RMSE RSquared MAE
Local Authorities* 0.032 0.810 0.019
Output Areas 5.000 0.205 1.047

*No retail centre predictors for the Local Authority models

Variable importance for Local Authorities

Variable importance for Output Areas

Regional similarities

Region RSquared LAD Rank LAD RSquared OA Rank OA
South East 0.947 1 0.134 2
Wales 0.916 2 0.131 3
Yorkshire and The Humber 0.906 3 0.144 1
North East 0.895 4 0.128 4
West Midlands 0.883 5 0.070 9
East Midlands 0.882 6 0.088 8
East of England 0.876 7 0.106 6
South West 0.864 8 0.117 5
London 0.805 9 0.055 10
Scotland 0.770 10 0.035 11
North West 0.664 11 0.017 12
Nortern Ireland 0.576 12 0.101 7

Alternative specification: growth, trained on all data


\[Website\,Density\,Growth_{t} \sim \color{orange}{Distance\,London} + \color{orange}{Website\,density\,London_{t-1}} + \\ \color{orange}{Distance\,Nearest\,City} + \color{orange}{Website\,density\,Nearest\,City_{t-1}} + \\ \color{orange}{Distance\,Nearest\,Retail_{i}} + \color{orange}{Website\,density\,Nearest\,Retail_{t-1}} +\\ \color{olive}{W*\, Website\,density_{t-1}} +\\ \color{violet}{year_{t}}\]


RMSE RSquared MAE
Local Authorities* 0.200 0.634 0.148

*No retail centre predictors for the Local Authority models

Alternative specification: growth, trained on all data

Region Rsquared
South East 0.915
West Midlands 0.903
Wales 0.892
East of England 0.885
Yorkshire and The Humber 0.885
East Midlands 0.885
North East 0.866
London 0.852
South West 0.833
North West 0.768
Scotland 0.697
Nortern Ireland 0.473

Conclusions

  • Established technological diffusion spatial processes still apply

    • for a digital technology
    • at local scales
  • Geography matters: spatial dependency, urban gravitation

  • Hierarchical diffusion

  • Granular analysis reveals patterns otherwise not visible

  • Stability and volatility: leapfrogging, early adopters dropping, but also stable positions

  • Spatially consistent mechanisms at local scale

  • Regional heterogeneity

References

Grubler, Arnulf. 1990. The Rise and Fall of Infrastructures: Dynamics of Evolution and Technological Change in Transport. Physica-Verlag.
Hägerstrand, Torsten. 1965. “A Monte Carlo Approach to Diffusion.” European Journal of Sociology/Archives Européennes de Sociologie 6 (1): 43–67.
Hägerstrand, Torsten et al. 1968. “Innovation Diffusion as a Spatial Process.” Innovation Diffusion as a Spatial Process.
Meade, Nigel, and Towhidul Islam. 2021. “Modelling and Forecasting National Introduction Times for Successive Generations of Mobile Telephony.” Telecommunications Policy 45 (3): 102088.
Zook, Matthew, and Michael McCanless. 2022. “Mapping the Uneven Geographies of Digital Phenomena: The Case of Blockchain.” The Canadian Geographer/Le Géographe Canadien 66 (1): 23–36.