A multi-scale story of the diffusion of a new technology: the web


Emmanouil Tranos

University of Bristol, Alan Turing Institute
, @EmmanouilTranos, etranos.info

Contents

  • Introduction

  • Web data

  • Methods

  • Results

    • S-shaped diffusion curves
    • Rank dynamics
    • Exploratory spatial analysis
    • Modelling
  • Conclusions

Introduction

Aim

  • Diffusion of a new technology: the web
  • Geographers used to be interested in diffusion
  • Hägerstrand et al. (1968)
  • Passed the torch to economists and sociologists
  • Why? Lack of granular data:

Because new digital activities are rarely—if ever—captured in official state data, researchers must rely on information gathered from alternative sources (Zook and McCanless 2022).

Importance

  • Guide policies for deployment of new technologies

  • Predictions of introduction times for future technologies (Meade and Islam 2021):

    • Network operators

    • Suppliers of network equipment

    • Regulatory authorities

Technological diffusion


Spatial diffusion processes

  • As in temporal diffusion models, an S-shaped pattern in the cumulative level of adoption

  • A hierarchy effect: from main centres to secondary ones – central places

  • A neighborhood effect: diffusion proceeds outwards from innovation centres, first “hitting” nearby rather than far-away locations (Grubler 1990)

Hägerstrand (1965): from innovative centres (core) through a hierarchy of sub-centres, to the periphery

Diffusion of a new digital technology


  • Diffusion of an intangible, digital technology [web]

  • Map the active engagement with the digital

  • Over time, early stages of the internet [1996-2012]

  • Granular and multi-scale spatial processes

Web data

Long story short

  • Data from the Internet Archive, the oldest web archive

  • Observe commercial websites 1996 - 2012 in the UK (.co.uk)

  • Geolocation: postcode references in the text

  • Timestamp: archival year

  • Counts

Web data: The Internet Archive

Web data: The Internet Archive

Long story short

  • Data from a Web Archive – The Internet Archive

  • Observe commercial websites 1996 - 2012 in the UK (.co.uk)

  • Geolocation: postcode references in the text

  • Timestamp: archival year

  • Counts

Web data: The Internet Archive

  • The largest archive of webpages in the world
  • 273 billion webpages from over 361 million websites, 15 petabytes of storage (1996 -)
  • A web crawler starts with a list of URLs (a seed list) to crawl and downloads a copy of their content
  • Using the hyperlinks included in the crawled URLs, new URLs are identified and crawled (snowball sampling)
  • Time-stamp

Our web data

  • JISC UK Web Domain Dataset: all archived webpages from the .uk domain 1996-2012

  • Curated by the British Library

Our web data

  • All .uk archived webpages which contain a UK postcode in the web text

  • Circa 0.5 billion URLs with valid UK postcodes



20080509162138 | http://www.website1.co.uk/contact_us | IG8 8HD

Data cleaning

Unique postcodes frequencies, 2000

level freq perc cumfreq cumperc
(0,1] 41,596 0.718 41,596 0.718
(1,2] 6,451 0.111 48,047 0.830
(2,10] 6,163 0.106 54,210 0.936
(10,100] 2,975 0.051 57,185 0.988
(100,1000] 646 0.011 57,831 0.999
(1000,10000] 62 0.001 57,893 1.000
(10000,100000] 4 0.000 57,897 1.000


  • Websites with a large number of postcodes: e.g. directories, real estate websites

  • Focus on websites with one unique postcode per year

Directory website with a lot of postcodes

Website with a unique postcode in London

Methods

Reminder: diffusion mechanisms

  • S-shaped pattern in the cumulative level of adoption

  • A hierarchy effect: from main centres to secondary ones

  • A neighborhood effect: first “hitting” nearby locations

Methods

  • Cumulative adoption: Self-starting logistic growth model [nls and SSlogis]

  • Descriptive statistics & ESDA

  • Machine learning framework [random forests]

  • Two scales:

    • websites per firm in a Local Authority
    • websites in an Output Area

S-shaped diffusion curves

Diffusion speed


  • Spatial heterogeneity

  • Not a clear, easy to explain pattern

Rank dynamics: stability vs. volatility


  • Adoption heterogeneity

  • Different perceptions of risk and economic returns from new technologies

  • Early adopters vs. laggards, leapfrogging

Rank dynamics and diffusion speed

  • Spatial heterogeneity

  • Expected volatility

Spatial mechanisms

Neighbourhood effect: diffusion proceeds outwards from innovation centers, first “hitting” nearby rather than far-away locations (Grubler 1990)

  • Spatial dependency (Moran’s I & LISA maps)

  • Websites per firm in Local authorities (c. 400)

  • Websites in Output Areas (c. 230,000)

Neighbourhood effect

  • Spatial dependency
    • Relatively small, consistent over time / scales
    • London hot spot early on
    • At local scale, consistent hotspots over time
    • Granular analysis reveals other hotspots


Hierarchy effect: from main centers to secondary ones – central places

  • Gini coefficient

Hieararchy

  • Almost perfect polarisation of web adoption in the early stages at a granular level

  • Polarisation decreases over time

  • More equally diffused at the Local Authority level

Putting all of these together

Modelling framework

  • A hierarchy effect: from main centres to secondary ones

  • A neighborhood effect: first “hitting” nearby locations

  • S-shaped pattern in the cumulative level of adoption


\[Website\,Density_{t} \sim \color{orange}{Distance\,London} + \color{orange}{Website\,density\,London_{t-1}} + \\ \color{orange}{Distance\,Nearest\,City} + \color{orange}{Website\,density\,Nearest\,City_{t-1}} + \\ \color{orange}{Distance\,Nearest\,Retail_{i}} + \color{orange}{Website\,density\,Nearest\,Retail_{t-1}} +\\ \color{olive}{W*\, Website\,density_{t-1}} +\\ \color{violet}{year_{t}}\]

Modelling framework

  • Random forests to predict \(Website\,Density_{i,t}\)

  • 2 spatial resolutions:

    • Local Authorities (websites per firm)
    • Output Areas (websites)
  • 2 sets of models:

    • All the data ⟹ variable importance
    • Train on all but one region, test on the holdout region ⟹ spatial differences and similarities of diffusion mechanisms
  • Space-time sensitive 10-fold CV (CAST)

Models trained on all data


RMSE \(R^{2}\) MAE
Local Authorities* 0.028 0.850 0.019
Output Areas 3.284 0.320 1.034

*No retail centre predictors for the Local Authority models

Variable importance

Regional similarities

Region RSquared LAD Rank LAD RSquared OA Rank OA
South East 0.945 1 0.183 7
West Midlands 0.922 2 0.190 5
Wales 0.915 3 0.193 4
Yorkshire and The Humber 0.909 4 0.200 2
East Midlands 0.903 5 0.187 6
North East 0.896 6 0.156 9
South West 0.894 7 0.221 1
London 0.881 8 0.104 10
East of England 0.874 9 0.170 8
North West 0.837 10 0.194 3
Scotland 0.780 11 0.091 12
Nortern Ireland 0.579 12 0.099 11

Conclusions

  • Established technological diffusion spatial processes still apply

    • for a digital technology
    • at local scales
  • Geography matters: spatial dependency, urban gravitation

  • Hierarchical diffusion

  • Granular analysis reveals patterns otherwise not visible

  • Stability and volatility: leapfrogging, early adopters dropping, but also stable positions

  • Spatially consistent mechanisms at local scale

  • Regional heterogeneity

References

Grubler, Arnulf. 1990. The Rise and Fall of Infrastructures: Dynamics of Evolution and Technological Change in Transport. Physica-Verlag.
Hägerstrand, Torsten. 1965. “A Monte Carlo Approach to Diffusion.” European Journal of Sociology/Archives Européennes de Sociologie 6 (1): 43–67.
Hägerstrand, Torsten et al. 1968. “Innovation Diffusion as a Spatial Process.” Innovation Diffusion as a Spatial Process.
Meade, Nigel, and Towhidul Islam. 2021. “Modelling and Forecasting National Introduction Times for Successive Generations of Mobile Telephony.” Telecommunications Policy 45 (3): 102088.
Zook, Matthew, and Michael McCanless. 2022. “Mapping the Uneven Geographies of Digital Phenomena: The Case of Blockchain.” The Canadian Geographer/Le Géographe Canadien 66 (1): 23–36.