All .uk archived webpages which contain a UK postcode in the web text
- circa 0.5 billion URLs with valid UK postcodes
- 20080509162138/http://www.example_website_1.co.uk/contact_us IG8 8HD
Hyperlinks
- http://www.example_website_1.co.uk | http://www.example_website_2.co.uk | 3
- much larger pool, only part is geolocated
\[trade_{ij,t} \sim hyperlinks_{ij,t} + distance_{ij} + \\ pop.density_{i,t} + pop.density_{i,t} + empl_{i,t} + empl_{j,t} \]
\[\begin{align} R^2 = 1 - \frac{\sum_{k} (y_{k} - \hat{y_{k}})^2} {\sum_{k} (y_{k} - \overline{y_{k}})^2} \label{eq:rsquared} \end{align}\]
\[\begin{align} MAE = \frac{1}{N} \sum_{k = 1}^{N} |\hat{y_{k}} - y_{k}| \label{eq:mae} \end{align}\]
\[\begin{align} RMSE = \sqrt{\frac{\sum_{k = 1}^{N} (\hat{y_{k}} - y_{k})^2} {N}} \label{eq:rmse} \end{align}\]
All the archived .uk webpages
Archived during 2000-2010
Commercial webpages (.co.uk)
From webpages to websites:
- http://www.website1.co.uk/webpage1 and
- http://www.website1.co.uk/webpage2 are part of the
1 vs. multuple postcodes in a website
| level | freq | perc | cumfreq | cumperc | 
|---|---|---|---|---|
| (0,1] | 41596 | 0.718 | 41596 | 0.718 | 
| (1,2] | 6451 | 0.111 | 48047 | 0.830 | 
| (2,10] | 6163 | 0.106 | 54210 | 0.936 | 
| (10,100] | 2975 | 0.051 | 57185 | 0.988 | 
| (100,1000] | 646 | 0.011 | 57831 | 0.999 | 
| (1000,10000] | 62 | 0.001 | 57893 | 1.000 | 
| (10000,100000] | 4 | 0.000 | 57897 | 1.000 | 
| year | hyperlinks | distance | 
|---|---|---|
| 2000 | 0.539 | -0.219 | 
| 2001 | 0.578 | -0.221 | 
| 2002 | 0.793 | -0.221 | 
| 2003 | 0.483 | -0.220 | 
| 2004 | 0.807 | -0.223 | 
| 2005 | 0.643 | -0.219 | 
| 2006 | 0.585 | -0.219 | 
| 2007 | 0.598 | -0.214 | 
| 2008 | 0.491 | -0.205 | 
| 2009 | 0.922 | -0.207 | 
| 2010 | 0.674 | -0.205 | 
\[trade_{ij,t} \sim hyperlinks_{ij,t} + distance_{ij} + \\ pop.density_{i,t} + pop.density_{i,t} + empl_{i,t} + empl_{j,t} \]
| year | RMSE | Rsquared | MAE | 
|---|---|---|---|
| 2002 | 937.93 | 0.96 | 159.87 | 
| 2003 | 1360.28 | 0.94 | 244.75 | 
| 2004 | 1014.83 | 0.95 | 179.15 | 
| 2005 | 1790.07 | 0.89 | 304.86 | 
| 2006 | 1706.73 | 0.92 | 309.16 | 
| 2007 | 1920.11 | 0.91 | 210.23 | 
| 2008 | 1558.92 | 0.92 | 233.35 | 
| 2009 | 1353.12 | 0.93 | 202.70 | 
| 2010 | 3170.16 | 0.63 | 303.68 | 
| Code | Industry name | 
|---|---|
| s1 | Agriculture | 
| s2 | Mining, quarrying and energy supply | 
| s3 | Food beverages and tobacco | 
| s4 | Textiles and leather etc. | 
| s5 | Coke, refined petroleum, nuclear fuel and chemicals etc. | 
| s6 | Electrical and optical equipment and transport equipment | 
| s8 | Other manufacturing | 
| s9 | Construction | 
| s10 | Distribution | 
| s11 | Hotels and restaurant | 
| s12 | Transport storage and communication | 
| s13 | Financial intermediation | 
| s14 | Real estate renting and business activities | 
| s15 | Non-Market Services | 
| year | RMSE | Rsquared | MAE | 
|---|---|---|---|
| 2002 | 1181.91 | 0.94 | 244.27 | 
| 2003 | 1428.99 | 0.93 | 282.77 | 
| 2004 | 1011.14 | 0.95 | 173.31 | 
| 2005 | 1414.77 | 0.94 | 232.25 | 
| 2006 | 1433.92 | 0.94 | 208.32 | 
| 2007 | 1894.59 | 0.91 | 227.77 | 
| 2008 | 1206.30 | 0.95 | 249.66 | 
| 2009 | 2008.83 | 0.81 | 238.38 | 
| 2010 | 2500.10 | 0.78 | 298.27 |