A multi-scale story of the diffusion of a new technology: the web


Emmanouil Tranos

University of Bristol, Alan Turing Institute
, @EmmanouilTranos, etranos.info

Contents

  • Introduction

  • Web data

  • Methods

  • Results

    • S-shaped diffusion curves
    • Rank dynamics
    • Exploratory spatial analysis
    • Modelling
  • Conclusions

Introduction

Aim

  • Diffusion of a new technology: the web
  • Geographers used to be interested in diffusion
  • Hägerstrand et al. (1968)
  • Passed the torch to economists and sociologists
  • Why? Lack of granular data:

Because new digital activities are rarely—if ever—captured in official state data, researchers must rely on information gathered from alternative sources (Zook and McCanless 2022).

Importance

  • Guide policies for deployment of new technologies

  • Predictions of introduction times for future technologies (Meade and Islam 2021):

    • Network operators

    • Suppliers of network equipment

    • Regulatory authorities

Technological diffusion


Spatial diffusion processes

  • As in temporal diffusion models, an S-shaped pattern in the cumulative level of adoption

  • A hierarchy effect: from main centres to secondary ones – central places

  • A neighborhood effect: diffusion proceeds outwards from innovation centres, first “hitting” nearby rather than far-away locations (Grubler 1990)

Hägerstrand (1965): from innovative centres (core) through a hierarchy of sub-centres, to the periphery

Diffusion of a new digital technology


  • Diffusion of an intangible, digital technology [web]

  • Map the active engagement with the digital

  • Over time, early stages of the internet [1996-2012]

  • Granular and multi-scale spatial processes

Web data

Long story short

  • Data from the Internet Archive, the oldest web archive

  • Observe commercial websites 1996 - 2012 in the UK (.co.uk)

  • Geolocation: postcode references in the text

  • Timestamp: archival year

  • Counts

Web data: The Internet Archive

Web data: The Internet Archive

Long story short

  • Data from a Web Archive – The Internet Archive

  • Observe commercial websites 1996 - 2012 in the UK (.co.uk)

  • Geolocation: postcode references in the text

  • Timestamp: archival year

  • Counts

Web data: The Internet Archive

  • The largest archive of webpages in the world
  • 273 billion webpages from over 361 million websites, 15 petabytes of storage (1996 -)
  • A web crawler starts with a list of URLs (a seed list) to crawl and downloads a copy of their content
  • Using the hyperlinks included in the crawled URLs, new URLs are identified and crawled (snowball sampling)
  • Time-stamp

Our web data

  • JISC UK Web Domain Dataset: all archived webpages from the .uk domain 1996-2012

  • Curated by the British Library

Our web data

  • All .uk archived webpages which contain a UK postcode in the web text

  • Circa 0.5 billion URLs with valid UK postcodes



20080509162138 | http://www.website1.co.uk/contact_us | IG8 8HD

Data cleaning

Unique postcodes frequencies, 2000

level freq perc cumfreq cumperc
(0,1] 41,596 0.718 41,596 0.718
(1,2] 6,451 0.111 48,047 0.830
(2,10] 6,163 0.106 54,210 0.936
(10,100] 2,975 0.051 57,185 0.988
(100,1000] 646 0.011 57,831 0.999
(1000,10000] 62 0.001 57,893 1.000
(10000,100000] 4 0.000 57,897 1.000


  • Websites with a large number of postcodes: e.g. directories, real estate websites

  • Focus on websites with one unique postcode per year

Directory website with a lot of postcodes

Website with a unique postcode in London

Methods

Reminder: diffusion mechanisms

  • S-shaped pattern in the cumulative level of adoption

  • A hierarchy effect: from main centres to secondary ones

  • A neighborhood effect: first “hitting” nearby locations

Methods

  • Cumulative adoption: Self-starting logistic growth model [nls and SSlogis]

  • Descriptive statistics & ESDA

  • Machine learning framework [random forests]

  • Two scales:

    • websites per firm in a Local Authority
    • websites in an Output Area

S-shaped diffusion curves