Introduction
Web data
Methods
Results
Conclusions
Because new digital activities are rarely—if ever—captured in official state data, researchers must rely on information gathered from alternative sources (Zook and McCanless 2022).
Guide policies for deployment of new technologies
Predictions of introduction times for future technologies (Meade and Islam 2021):
Network operators
Suppliers of network equipment
Regulatory authorities
As in temporal diffusion models, an S-shaped pattern in the cumulative level of adoption
A hierarchy effect: from main centres to secondary ones – central places
A neighborhood effect: diffusion proceeds outwards from innovation centres, first “hitting” nearby rather than far-away locations (Grubler 1990)
Hägerstrand (1965): from innovative centres (core) through a hierarchy of sub-centres, to the periphery
Diffusion of an intangible, digital technology [web]
Map the active engagement with the digital
Over time, early stages of the internet [1996-2012]
Granular and multi-scale spatial processes
Data from the Internet Archive, the oldest web archive
Observe commercial websites 1996 - 2012 in the UK (.co.uk)
Geolocation: postcode references in the text
Timestamp: archival year
Counts
Data from a Web Archive – The Internet Archive
Observe commercial websites 1996 - 2012 in the UK (.co.uk)
Geolocation: postcode references in the text
Timestamp: archival year
Counts
JISC UK Web Domain Dataset: all archived webpages from the .uk domain 1996-2012
Curated by the British Library
All .uk archived webpages which contain a UK postcode in the web text
Circa 0.5 billion URLs with valid UK postcodes
20080509162138 | http://www.website1.co.uk/contact_us | IG8 8HD
All the archived .uk webpages
Archived during 1996-2012
Commercial webpages (.co.uk)
From webpages to websites:
- http://www.website1.co.uk/webpage1 and
- http://www.website1.co.uk/webpage2 are part of the
1 vs. multuple postcodes in a website
level | freq | perc | cumfreq | cumperc |
---|---|---|---|---|
(0,1] | 41,596 | 0.718 | 41,596 | 0.718 |
(1,2] | 6,451 | 0.111 | 48,047 | 0.830 |
(2,10] | 6,163 | 0.106 | 54,210 | 0.936 |
(10,100] | 2,975 | 0.051 | 57,185 | 0.988 |
(100,1000] | 646 | 0.011 | 57,831 | 0.999 |
(1000,10000] | 62 | 0.001 | 57,893 | 1.000 |
(10000,100000] | 4 | 0.000 | 57,897 | 1.000 |
Websites with a large number of postcodes: e.g. directories, real estate websites
Focus on websites with one unique postcode per year
S-shaped pattern in the cumulative level of adoption
A hierarchy effect: from main centres to secondary ones
A neighborhood effect: first “hitting” nearby locations
Cumulative adoption: Self-starting logistic growth model
[nls
and SSlogis
]
Descriptive statistics & ESDA
Machine learning framework [random forests]
Two scales:
Spatial heterogeneity
Not a clear, easy to explain pattern
Adoption heterogeneity
Different perceptions of risk and economic returns from new technologies
Early adopters vs. laggards, leapfrogging
Spatial heterogeneity
Expected volatility
Neighbourhood effect: diffusion proceeds outwards from innovation centers, first “hitting” nearby rather than far-away locations (Grubler 1990)
Spatial dependency (Moran’s I & LISA maps)
Websites per firm in Local authorities (c. 400)
Websites in Output Areas (c. 230,000)
Hierarchy effect: from main centers to secondary ones – central places
Almost perfect polarisation of web adoption in the early stages at a granular level
Polarisation decreases over time
More equally diffused at the Local Authority level
A hierarchy effect: from main centres to secondary ones
A neighborhood effect: first “hitting” nearby locations
S-shaped pattern in the cumulative level of adoption
\[Website\,Density_{t} \sim
\color{orange}{Distance\,London} +
\color{orange}{Website\,density\,London_{t-1}} + \\
\color{orange}{Distance\,Nearest\,City} +
\color{orange}{Website\,density\,Nearest\,City_{t-1}} + \\
\color{orange}{Distance\,Nearest\,Retail_{i}} +
\color{orange}{Website\,density\,Nearest\,Retail_{t-1}} +\\
\color{olive}{W*\, Website\,density_{t-1}} +\\
\color{violet}{year_{t}}\]
Random forests to predict \(Website\,Density_{i,t}\)
2 spatial resolutions:
2 sets of models:
Space-time sensitive 10-fold CV (CAST
)
RMSE | \(R^{2}\) | MAE | |
---|---|---|---|
Local Authorities* | 0.028 | 0.850 | 0.019 |
Output Areas | 3.284 | 0.320 | 1.034 |
*No retail centre predictors for the Local Authority models
Region | RSquared LAD | Rank LAD | RSquared OA | Rank OA |
---|---|---|---|---|
South East | 0.945 | 1 | 0.183 | 7 |
West Midlands | 0.922 | 2 | 0.190 | 5 |
Wales | 0.915 | 3 | 0.193 | 4 |
Yorkshire and The Humber | 0.909 | 4 | 0.200 | 2 |
East Midlands | 0.903 | 5 | 0.187 | 6 |
North East | 0.896 | 6 | 0.156 | 9 |
South West | 0.894 | 7 | 0.221 | 1 |
London | 0.881 | 8 | 0.104 | 10 |
East of England | 0.874 | 9 | 0.170 | 8 |
North West | 0.837 | 10 | 0.194 | 3 |
Scotland | 0.780 | 11 | 0.091 | 12 |
Nortern Ireland | 0.579 | 12 | 0.099 | 11 |
Established technological diffusion spatial processes still apply
Geography matters: spatial dependency, urban gravitation
Hierarchical diffusion
Granular analysis reveals patterns otherwise not visible
Stability and volatility: leapfrogging, early adopters dropping, but also stable positions
Spatially consistent mechanisms at local scale
Regional heterogeneity