A multi-scale story of the diffusion of a new technology: the web

Emmanouil Tranos

University of Bristol, Alan Turing Institute
e.tranos@bristol.ac.uk, @EmmanouilTranos, etranos.info

Introduction
Web data
Methods
Results
- S-shaped diffusion curves
- Rank dynamics
- Exploratory spatial analysis
- Modelling
Conclusions

Introduction

Aim

Diffusion of a new technology: the web
Geographers used to be interested in diffusion
Hägerstrand et al. (1968)
Passed the torch to economists and sociologists
Why? Lack of granular data:

Because new digital activities are rarely—if ever—captured in official state data, researchers must rely on information gathered from alternative sources (Zook and McCanless 2022).

Importance

Guide policies for deployment of new technologies
Predictions of introduction times for future technologies (Meade and Islam 2021):
- Network operators
- Suppliers of network equipment
- Regulatory authorities

Technological diffusion

Spatial diffusion processes

As in temporal diffusion models, an S-shaped pattern in the cumulative level of adoption
A hierarchy effect: from main centres to secondary ones – central places
A neighborhood effect: diffusion proceeds outwards from innovation centres, first “hitting” nearby rather than far-away locations (Grubler 1990)

Hägerstrand (1965): from innovative centres (core) through a hierarchy of sub-centres, to the periphery

Diffusion of a new digital technology

Diffusion of an intangible, digital technology [web]
Map the active engagement with the digital
Over time, early stages of the internet [1996-2012]
Granular and multi-scale spatial processes

Web data

Long story short

Data from the Internet Archive, the oldest web archive
Observe commercial websites 1996 - 2012 in the UK (.co.uk)
Geolocation: postcode references in the text
Timestamp: archival year
Counts

Web data: The Internet Archive

Long story short

Data from a Web Archive – The Internet Archive
Observe commercial websites 1996 - 2012 in the UK (.co.uk)
Geolocation: postcode references in the text
Timestamp: archival year
Counts

Web data: The Internet Archive

The largest archive of webpages in the world
273 billion webpages from over 361 million websites, 15 petabytes of storage (1996 -)
A web crawler starts with a list of URLs (a seed list) to crawl and downloads a copy of their content
Using the hyperlinks included in the crawled URLs, new URLs are identified and crawled (snowball sampling)
Time-stamp

Our web data

JISC UK Web Domain Dataset: all archived webpages from the .uk domain 1996-2012
Curated by the British Library

Our web data

All .uk archived webpages which contain a UK postcode in the web text
Circa 0.5 billion URLs with valid UK postcodes

20080509162138 | http://www.website1.co.uk/contact_us | IG8 8HD

Data cleaning

All the archived .uk webpages
Archived during 1996-2012
Commercial webpages (.co.uk)
From webpages to websites:

- http://www.website1.co.uk/webpage1 and

- http://www.website1.co.uk/webpage2 are part of the

- http://www.website1.co.uk
1 vs. multuple postcodes in a website

Unique postcodes frequencies, 2000

level	freq	perc	cumfreq	cumperc
(0,1]	41,596	0.718	41,596	0.718
(1,2]	6,451	0.111	48,047	0.830
(2,10]	6,163	0.106	54,210	0.936
(10,100]	2,975	0.051	57,185	0.988
(100,1000]	646	0.011	57,831	0.999
(1000,10000]	62	0.001	57,893	1.000
(10000,100000]	4	0.000	57,897	1.000

Websites with a large number of postcodes: e.g. directories, real estate websites
Focus on websites with one unique postcode per year

Directory website with a lot of postcodes

Website with a unique postcode in London

Methods

Reminder: diffusion mechanisms

S-shaped pattern in the cumulative level of adoption
A hierarchy effect: from main centres to secondary ones
A neighborhood effect: first “hitting” nearby locations

Methods

Cumulative adoption: Self-starting logistic growth model [nls and SSlogis]
Descriptive statistics & ESDA
Machine learning framework [random forests]
Two scales:
- websites per firm in a Local Authority
- websites in an Output Area

S-shaped diffusion curves

Diffusion speed

Spatial heterogeneity
Not a clear, easy to explain pattern

Rank dynamics: stability vs. volatility

Adoption heterogeneity
Different perceptions of risk and economic returns from new technologies
Early adopters vs. laggards, leapfrogging

Rank dynamics and diffusion speed

Spatial heterogeneity
Expected volatility

Spatial mechanisms

Neighbourhood effect: diffusion proceeds outwards from innovation centers, first “hitting” nearby rather than far-away locations (Grubler 1990)

Spatial dependency (Moran’s I & LISA maps)
Websites per firm in Local authorities (c. 400)
Websites in Output Areas (c. 230,000)

Neighbourhood effect

Spatial dependency
- Relatively small, consistent over time / scales
- London hot spot early on
- At local scale, consistent hotspots over time
- Granular analysis reveals other hotspots

Hierarchy effect: from main centers to secondary ones – central places

Gini coefficient

Hieararchy

Almost perfect polarisation of web adoption in the early stages at a granular level
Polarisation decreases over time
More equally diffused at the Local Authority level

Putting all of these together

Modelling framework

A hierarchy effect: from main centres to secondary ones
A neighborhood effect: first “hitting” nearby locations
S-shaped pattern in the cumulative level of adoption

\[Website\,Density_{t} \sim \color{orange}{Distance\,London} + \color{orange}{Website\,density\,London_{t-1}} + \\ \color{orange}{Distance\,Nearest\,City} + \color{orange}{Website\,density\,Nearest\,City_{t-1}} + \\ \color{orange}{Distance\,Nearest\,Retail_{i}} + \color{orange}{Website\,density\,Nearest\,Retail_{t-1}} +\\ \color{olive}{W*\, Website\,density_{t-1}} +\\ \color{violet}{year_{t}}\]

Modelling framework

Random forests to predict \(Website\,Density_{i,t}\)
2 spatial resolutions:
- Local Authorities (websites per firm)
- Output Areas (websites)
2 sets of models:
- All the data ⟹ variable importance
- Train on all but one region, test on the holdout region ⟹ spatial differences and similarities of diffusion mechanisms
Space-time sensitive 10-fold CV (CAST)

Models trained on all data

	RMSE	\(R^{2}\)	MAE
Local Authorities*	0.028	0.850	0.019
Output Areas	3.284	0.320	1.034

*No retail centre predictors for the Local Authority models

Variable importance

Regional similarities

Region	RSquared LAD	Rank LAD	RSquared OA	Rank OA
South East	0.945	1	0.183	7
West Midlands	0.922	2	0.190	5
Wales	0.915	3	0.193	4
Yorkshire and The Humber	0.909	4	0.200	2
East Midlands	0.903	5	0.187	6
North East	0.896	6	0.156	9
South West	0.894	7	0.221	1
London	0.881	8	0.104	10
East of England	0.874	9	0.170	8
North West	0.837	10	0.194	3
Scotland	0.780	11	0.091	12
Nortern Ireland	0.579	12	0.099	11

Conclusions

Established technological diffusion spatial processes still apply
- for a digital technology
- at local scales
Geography matters: spatial dependency, urban gravitation
Hierarchical diffusion
Granular analysis reveals patterns otherwise not visible
Stability and volatility: leapfrogging, early adopters dropping, but also stable positions
Spatially consistent mechanisms at local scale
Regional heterogeneity

References

Grubler, Arnulf. 1990. The Rise and Fall of Infrastructures: Dynamics of Evolution and Technological Change in Transport. Physica-Verlag.

Hägerstrand, Torsten. 1965. “A Monte Carlo Approach to Diffusion.” European Journal of Sociology/Archives Européennes de Sociologie 6 (1): 43–67.

Hägerstrand, Torsten et al. 1968. “Innovation Diffusion as a Spatial Process.” Innovation Diffusion as a Spatial Process.

Meade, Nigel, and Towhidul Islam. 2021. “Modelling and Forecasting National Introduction Times for Successive Generations of Mobile Telephony.” Telecommunications Policy 45 (3): 102088.

Zook, Matthew, and Michael McCanless. 2022. “Mapping the Uneven Geographies of Digital Phenomena: The Case of Blockchain.” The Canadian Geographer/Le Géographe Canadien 66 (1): 23–36.

A multi-scale story of the diffusion of a new technology: the web

Emmanouil Tranos University of Bristol, Alan Turing Institute e.tranos@bristol.ac.uk, @EmmanouilTranos, etranos.info

Contents

Introduction

Aim

Importance

Technological diffusion

Spatial diffusion processes

Diffusion of a new digital technology

Web data

Long story short

Web data: The Internet Archive

Web data: The Internet Archive

Long story short

Web data: The Internet Archive

Our web data

Our web data

Data cleaning

Unique postcodes frequencies, 2000

Directory website with a lot of postcodes

Website with a unique postcode in London

Methods

Reminder: diffusion mechanisms

Methods

S-shaped diffusion curves

Diffusion speed

Rank dynamics: stability vs. volatility

Rank dynamics and diffusion speed

Spatial mechanisms

Neighbourhood effect

Hieararchy

Putting all of these together

Modelling framework

Modelling framework

Models trained on all data

Variable importance

Regional similarities

Conclusions

References

Emmanouil Tranos

University of Bristol, Alan Turing Institute
e.tranos@bristol.ac.uk, @EmmanouilTranos, etranos.info