All .uk archived webpages which contain a UK postcode in the web text
- circa 0.5 billion URLs with valid UK postcodes
- 20080509162138/http://www.example_website_1.co.uk/contact_us IG8 8HD
Hyperlinks
- http://www.example_website_1.co.uk | http://www.example_website_2.co.uk | 3
- much larger pool, only part is geolocated
\[trade_{ij,t} \sim hyperlinks_{ij,t} + distance_{ij} + \\ pop.density_{i,t} + pop.density_{i,t} + empl_{i,t} + empl_{j,t} \]
\[\begin{align} R^2 = 1 - \frac{\sum_{k} (y_{k} - \hat{y_{k}})^2} {\sum_{k} (y_{k} - \overline{y_{k}})^2} \label{eq:rsquared} \end{align}\]
\[\begin{align} MAE = \frac{1}{N} \sum_{k = 1}^{N} |\hat{y_{k}} - y_{k}| \label{eq:mae} \end{align}\]
\[\begin{align} RMSE = \sqrt{\frac{\sum_{k = 1}^{N} (\hat{y_{k}} - y_{k})^2} {N}} \label{eq:rmse} \end{align}\]
All the archived .uk webpages
Archived during 2000-2010
Commercial webpages (.co.uk)
From webpages to websites:
- http://www.website1.co.uk/webpage1 and
- http://www.website1.co.uk/webpage2 are part of the
1 vs. multuple postcodes in a website
level | freq | perc | cumfreq | cumperc |
---|---|---|---|---|
(0,1] | 41596 | 0.718 | 41596 | 0.718 |
(1,2] | 6451 | 0.111 | 48047 | 0.830 |
(2,10] | 6163 | 0.106 | 54210 | 0.936 |
(10,100] | 2975 | 0.051 | 57185 | 0.988 |
(100,1000] | 646 | 0.011 | 57831 | 0.999 |
(1000,10000] | 62 | 0.001 | 57893 | 1.000 |
(10000,100000] | 4 | 0.000 | 57897 | 1.000 |
year | hyperlinks | distance |
---|---|---|
2000 | 0.539 | -0.219 |
2001 | 0.578 | -0.221 |
2002 | 0.793 | -0.221 |
2003 | 0.483 | -0.220 |
2004 | 0.807 | -0.223 |
2005 | 0.643 | -0.219 |
2006 | 0.585 | -0.219 |
2007 | 0.598 | -0.214 |
2008 | 0.491 | -0.205 |
2009 | 0.922 | -0.207 |
2010 | 0.674 | -0.205 |
\[trade_{ij,t} \sim hyperlinks_{ij,t} + distance_{ij} + \\ pop.density_{i,t} + pop.density_{i,t} + empl_{i,t} + empl_{j,t} \]
year | RMSE | Rsquared | MAE |
---|---|---|---|
2002 | 937.93 | 0.96 | 159.87 |
2003 | 1360.28 | 0.94 | 244.75 |
2004 | 1014.83 | 0.95 | 179.15 |
2005 | 1790.07 | 0.89 | 304.86 |
2006 | 1706.73 | 0.92 | 309.16 |
2007 | 1920.11 | 0.91 | 210.23 |
2008 | 1558.92 | 0.92 | 233.35 |
2009 | 1353.12 | 0.93 | 202.70 |
2010 | 3170.16 | 0.63 | 303.68 |
Code | Industry name |
---|---|
s1 | Agriculture |
s2 | Mining, quarrying and energy supply |
s3 | Food beverages and tobacco |
s4 | Textiles and leather etc. |
s5 | Coke, refined petroleum, nuclear fuel and chemicals etc. |
s6 | Electrical and optical equipment and transport equipment |
s8 | Other manufacturing |
s9 | Construction |
s10 | Distribution |
s11 | Hotels and restaurant |
s12 | Transport storage and communication |
s13 | Financial intermediation |
s14 | Real estate renting and business activities |
s15 | Non-Market Services |
year | RMSE | Rsquared | MAE |
---|---|---|---|
2002 | 1181.91 | 0.94 | 244.27 |
2003 | 1428.99 | 0.93 | 282.77 |
2004 | 1011.14 | 0.95 | 173.31 |
2005 | 1414.77 | 0.94 | 232.25 |
2006 | 1433.92 | 0.94 | 208.32 |
2007 | 1894.59 | 0.91 | 227.77 |
2008 | 1206.30 | 0.95 | 249.66 |
2009 | 2008.83 | 0.81 | 238.38 |
2010 | 2500.10 | 0.78 | 298.27 |