Diversity in cities


Emmanouil Tranos

University of Bristol, Alan Turing Institute
e.tranos@bristol.ac.uk, @EmmanouilTranos, etranos.info

Economic diversity

  • Production, i.e. firms

  • Consumption, i.e. product variety

  • Labour pool, i.e. skills in labour market

In general is a good thing for:

  • urban economies

  • productivity

  • urban and industrial agglomeration

Opposing forces

  • Within-sector or Marshall–Arrow–Romer (MAR) spillovers,

  • Between-sector or Jacobs spillovers

  • Large empirical literature trying to identify the write ratio (e.g. Saviotti and Frenken (2008); Caragliu, Dominicis, and Groot (2016))

  • MAR externalities (or spillovers): good for productivity and short-term growth

  • Jacobean externalities: good for innovation and long-term growth

Opposing forces

Using more clear econ terminology (Fujita et al. 1989):

  • Diverse cities (heterogeneous agglomerations) enjoy economies of scope

  • Homogeneous agglomeration enjoy increasing returns from economies of scale

On the ground

  • Ambiguous concepts

  • Abundance, difference or number of, but also the degrees of richness, concentration or evenness (Yuo and Tseng 2021)

  • Different ways to measure (Bettencourt 2021)

Spieces richness…

  • … aka variety

  • \(\sum_{i}p_{i}^0\)

  • \(p_i\) is the proportion of data points in the \(i\)th category

Interpretation:

  • Plurality

  • Availability of options

Shannon entropy

  • \(H = -\sum_{i=1}^n p_{i} \ln{p_{i}}\)

  • \(n\) is the number of total categories

  • \(p_i\) is the proportion of data points in the \(i\)th category

  • Probably the most common diversity index.

  • Interpretation:

    • If one category dominates ➔ less surprise ➔ low entropy

    • No category dominates ➔ more surprise ➔ high entropy

Herfindahl-Hirschman index

  • \(HHI = \sum_{i}(p_{i}^2)\)

  • \(p_i\) is the proportion of data points in the \(i\)th category

  • Concentration of the market.

  • Interpretation:

    • \(1/n \leq HHI \leq 1\)

    • Two scenarios:

HHI_1 = .8^2 + .05^2 + .05^2 + .1^2
HHI_1
[1] 0.655
HHI_2 = .25^2 + .25^2 + .25^2 + .25^2
HHI_2
[1] 0.25

Herfindahl-Hirschman index

  • Caution: alternative specification

  • \(HHI = 1- \sum_{i}(p_{i}^2)\)

Examples

Relatedness

  • Relatedness spans the continuum between MAR and Jacobs (Hidalgo 2021)

  • Related activities are neither exactly the same nor completely different (Frenken, Van Oort, and Verburg 2007; Boschma et al. 2012)

  • Why? Because:

    • identical activities compete for for customers and resources,

    • no learning between very dissimilar economicactivities

Relatedness

  • Absorptive capacity: a firm’s capacity to absorb new knowledge depends on its prior level of related knowledge (Cohen and Levinthal 1990)

Economic complexity

  • Large scale fine-grained data on economic activities

  • Learn about abstract factors of production and the way they combine into outputs

  • Dimensionality reduction techniques to data on the geography of activities, e.g. employment by industry or patents by technology.

  • Machine learning and network techniques to predict and explain the economic trajectories of countries, cities and regions.

For a review, check Hidalgo (2021) and Balland et al. (2022).

Measuring diversity

Measuring diversity

  • Go to data.london.gov.uk

  • Download and and save locally the Businesses-in-London.csv

  • Make sure you know the file location!

  • We will use the REAT and entropy packages. Check what these packages do here and here.

  • Install them if needed with install.packages("packagename")

Measuring diversity

library(tidyverse)  #for data wrangling
library(rprojroot)  #for relative paths
library(REAT)       #for diversity measures
library(entropy)    #for entropy
library(cluster)    #for cluster analysis
library(factoextra) #help functions for clustering 
library(kableExtra) #for nice html tables
library(dbscan)     #for HDBSCAL

# This is the project path
path <- find_rstudio_root_file()
path.data <- paste0(path, "/data/businesses-in-london.csv")

london.firms <- read_csv(path.data) 

london.firms %>% 
  filter(SICCode.SicText_1!="None Supplied") %>% 
  group_by(oslaua, SICCode.SicText_1) %>% 
  summarise(n = n()) %>% 
  mutate(total = sum(n),
         freq = n / total) %>% 
  group_by(oslaua) %>% 
  summarise(richness = n_distinct(SICCode.SicText_1),
            entropy = entropy(freq, method = "ML"),
            herf = herf(n)) %>% 
  arrange(-herf) 
# A tibble: 33 x 4
   oslaua    richness entropy   herf
   <chr>        <int>   <dbl>  <dbl>
 1 E09000013      602    4.64 0.0301
 2 E09000001      744    4.52 0.0288
 3 E09000033      827    4.61 0.0258
 4 E09000003      723    4.67 0.0233
 5 E09000008      678    4.79 0.0221
 6 E09000027      558    4.71 0.0220
 7 E09000015      633    4.73 0.0205
 8 E09000010      667    4.79 0.0205
 9 E09000007      827    4.83 0.0204
10 E09000020      577    4.77 0.0196
# ... with 23 more rows

Measuring diversity

Tip

You don’t know what local authorities these codes refer to. You should download the codes and name and join them with your data from here.

Tip

Discuss what we can learn from this exercise.

Can you think of a way to understand how these indices behave?

Measuring diversity

TO ADD: CHOROPLETH MAPS

Clustering

  • Reducing the dimensions of the observation space

  • Classification of observations into (exclusive) groups

  • Distance or (dis)similarity between each pair of observations to create a distance or dissimilarity or matrix

  • Observations within the same group are as similar as possible

  • Based on Boehmke and Greenwell (2019) available here

  • Plenty of other resources online and in textbooks

Source: medium.com

Clustering

  1. K-means

  2. Hierarchical clustering

K-means

  1. k is the number of clusters and is pre-defined

  2. The algorithm selects k random observations (starting centres)

  3. The remaining observations are assigned to the nearest centre

  4. Recalculates the new centres

  5. Re-check cluster assignment

  6. Iterative process to minimise within-cluster variation until convergence

\(SS_within = \sum_{k=1}^k W(C_{k}) = \sum_{k=1}^k \sum_{x_i\in C_K}(x_i-\mu_k)^2\)

K-means

First, create an appropriate data frame

la.sic <- london.firms %>% 
  filter(SICCode.SicText_1!="None Supplied") %>% # Droping firms which haven't declared SIC code
  group_by(oslaua, SICCode.SicText_1) %>% 
  summarise(n = n()) %>% 
  mutate(total = sum(n),
         freq = n / total) %>% 
  arrange(oslaua,-n) %>%
  select(-n, -total) %>% 
  pivot_wider(names_from = SICCode.SicText_1, values_from = freq) %>% 
  replace(is.na(.), 0)

la.sic %>%  
  select(1:20) %>%  #Select the first 20 columns as there 1037 in total
  kbl() %>%
  kable_styling()   #Nice(r) table
oslaua 82990 - Other business support service activities n.e.c. 64209 - Activities of other holding companies n.e.c. 70229 - Management consultancy activities other than financial management 64999 - Financial intermediation not elsewhere classified 99999 - Dormant Company 74990 - Non-trading company 70100 - Activities of head offices 68209 - Other letting and operating of own or leased real estate 62020 - Information technology consultancy activities 68100 - Buying and selling of own real estate 65120 - Non-life insurance 96090 - Other service activities n.e.c. 41100 - Development of building projects 62012 - Business and domestic software development 62090 - Other information technology service activities 35110 - Production of electricity 64205 - Activities of financial services holding companies 74909 - Other professional, scientific and technical activities n.e.c. 65110 - Life insurance
E09000001 0.1108145 0.0606884 0.0401823 0.0379038 0.0363252 0.0361787 0.0271462 0.0261046 0.0247376 0.0239889 0.0239238 0.0232403 0.0208642 0.0198714 0.0193018 0.0183579 0.0161282 0.0155586 0.0117829
E09000002 0.0270344 0.0027363 0.0241340 0.0024626 0.0061840 0.0026816 0.0013681 0.0173480 0.0254474 0.0313030 0.0001095 0.0348602 0.0163629 0.0068954 0.0107262 0.0000547 0.0004378 0.0078258 0.0000547
E09000003 0.0600694 0.0131911 0.0403222 0.0081328 0.0314536 0.0152407 0.0037313 0.0658109 0.0258041 0.0686751 0.0003942 0.0442374 0.0313223 0.0093678 0.0109181 0.0010511 0.0012876 0.0106816 0.0003022
E09000004 0.0714510 0.0073915 0.0429860 0.0039316 0.0150975 0.0074963 0.0025163 0.0291466 0.0347033 0.0404173 0.0004718 0.0300377 0.0214406 0.0071818 0.0114280 0.0004194 0.0004194 0.0112183 0.0001573
E09000005 0.0357725 0.0060626 0.0302829 0.0057610 0.0234964 0.0061229 0.0027749 0.0451228 0.0273572 0.0451529 0.0002111 0.0339326 0.0193039 0.0080533 0.0110394 0.0003318 0.0006937 0.0120951 0.0000603
E09000006 0.0475336 0.0079285 0.0482441 0.0065073 0.0258798 0.0078537 0.0035529 0.0407644 0.0376977 0.0391937 0.0012716 0.0269270 0.0253562 0.0119675 0.0129773 0.0002618 0.0007854 0.0157448 0.0001870
E09000007 0.0541925 0.0147704 0.0451042 0.0091985 0.0176188 0.0095582 0.0063354 0.0248425 0.0352450 0.0353404 0.0002790 0.0431734 0.0260612 0.0269274 0.0163561 0.0019454 0.0036706 0.0121643 0.0001688
E09000008 0.0358566 0.0042869 0.0328511 0.0057082 0.0886745 0.0097621 0.0027492 0.0308940 0.0316861 0.0352275 0.0002563 0.0260246 0.0205261 0.0105543 0.0156800 0.0003728 0.0027026 0.0091098 0.0002097
E09000009 0.0325698 0.0062051 0.0333419 0.0044036 0.0177290 0.0084928 0.0036602 0.0394327 0.0277087 0.0445226 0.0002002 0.0352864 0.0220468 0.0095222 0.0106374 0.0002860 0.0011152 0.0100369 0.0000858
E09000010 0.0361647 0.0106993 0.0303793 0.0056159 0.0855462 0.0089807 0.0024207 0.0462347 0.0218102 0.0379076 0.0004115 0.0357532 0.0213987 0.0127811 0.0115466 0.0002421 0.0017913 0.0091501 0.0002421
E09000011 0.0352018 0.0036909 0.0420300 0.0038754 0.0134717 0.0044752 0.0021223 0.0251903 0.0373702 0.0333564 0.0003691 0.0352480 0.0183622 0.0120877 0.0130104 0.0004614 0.0014302 0.0120415 0.0000923
E09000012 0.0512285 0.0097657 0.0496767 0.0063622 0.0329695 0.0031242 0.0022862 0.0412559 0.0384524 0.0476801 0.0002897 0.0205038 0.0148865 0.0235245 0.0167486 0.0015621 0.0025449 0.0118450 0.0000621
E09000013 0.0423722 0.0123893 0.0448190 0.0076899 0.0190306 0.0101755 0.0087774 0.1304179 0.0213220 0.0310704 0.0001554 0.0332453 0.0167780 0.0116126 0.0129719 0.0013982 0.0015147 0.0148361 0.0000388
E09000014 0.0339895 0.0056534 0.0407460 0.0045848 0.0256127 0.0055500 0.0031370 0.0527423 0.0272674 0.0578096 0.0001379 0.0438829 0.0167879 0.0113758 0.0085491 0.0003792 0.0009307 0.0091696 0.0000345
E09000015 0.0486010 0.0124162 0.0466257 0.0083570 0.0189498 0.0074019 0.0031692 0.0576744 0.0423495 0.0634049 0.0002605 0.0342313 0.0266123 0.0127418 0.0140876 0.0002171 0.0012373 0.0184940 0.0001954
E09000016 0.0449873 0.0058253 0.0335324 0.0070981 0.0186019 0.0052379 0.0017623 0.0327981 0.0270217 0.0414137 0.0008811 0.0315743 0.0251126 0.0092520 0.0097415 0.0000979 0.0009301 0.0105248 0.0000979
E09000017 0.0340671 0.0094125 0.0317595 0.0047062 0.0160012 0.0068316 0.0052528 0.0389252 0.0357674 0.0526188 0.0003036 0.0352513 0.0235616 0.0123273 0.0135722 0.0002429 0.0015181 0.0123577 0.0002429
E09000018 0.0348997 0.0088552 0.0347881 0.0054322 0.0130223 0.0102690 0.0069576 0.0298396 0.0549540 0.0416341 0.0002232 0.0361275 0.0193474 0.0161848 0.0141385 0.0004093 0.0013766 0.0104178 0.0001488
E09000019 0.0509646 0.0133515 0.0490896 0.0090981 0.0198825 0.0094630 0.0062919 0.0248783 0.0370846 0.0317616 0.0005537 0.0470006 0.0223615 0.0277725 0.0190771 0.0017617 0.0040646 0.0131753 0.0002139
E09000020 0.0499206 0.0187149 0.0547281 0.0133494 0.0227926 0.0141649 0.0104735 0.0411641 0.0188866 0.0365712 0.0002575 0.0403056 0.0202601 0.0148517 0.0121475 0.0031335 0.0041207 0.0146371 0.0000000
E09000021 0.0478516 0.0081613 0.0512695 0.0039063 0.0151367 0.0095564 0.0043248 0.0353655 0.0399693 0.0358538 0.0006975 0.0272740 0.0177176 0.0142997 0.0145787 0.0002790 0.0014648 0.0150670 0.0002790
E09000022 0.0335485 0.0086577 0.0420129 0.0044448 0.0180111 0.0090055 0.0052951 0.0372976 0.0252773 0.0306497 0.0001546 0.0331620 0.0143006 0.0117497 0.0112472 0.0028215 0.0009276 0.0134890 0.0000773
E09000023 0.0320543 0.0035557 0.0378921 0.0041925 0.0113570 0.0044579 0.0016982 0.0268535 0.0312583 0.0294008 0.0000531 0.0311522 0.0145412 0.0110917 0.0118877 0.0001061 0.0013268 0.0110917 0.0000531
E09000024 0.0354735 0.0083443 0.0490793 0.0051381 0.0251973 0.0069878 0.0037817 0.0349803 0.0517511 0.0415571 0.0005344 0.0318563 0.0206347 0.0218267 0.0152088 0.0011098 0.0009865 0.0107284 0.0001644
E09000025 0.0443489 0.0029330 0.0243489 0.0040202 0.0653097 0.0060430 0.0018710 0.0201770 0.0310999 0.0305689 0.0001517 0.0287737 0.0136030 0.0141087 0.0113780 0.0001264 0.0010619 0.0068015 0.0001517
E09000026 0.0483713 0.0060341 0.0313726 0.0041454 0.0177836 0.0118475 0.0019133 0.0443240 0.0351011 0.0608320 0.0003434 0.0384615 0.0226403 0.0095418 0.0106211 0.0001962 0.0008830 0.0094192 0.0001472
E09000027 0.0492061 0.0080081 0.0828589 0.0082396 0.0301810 0.0121742 0.0049067 0.0392075 0.0431422 0.0453641 0.0007406 0.0244410 0.0208767 0.0152294 0.0154145 0.0006018 0.0010647 0.0200435 0.0002314
E09000028 0.0530189 0.0184414 0.0437391 0.0094275 0.0362030 0.0226084 0.0137424 0.0285782 0.0310016 0.0226084 0.0004433 0.0282235 0.0125307 0.0165204 0.0143630 0.0043148 0.0039306 0.0142152 0.0001773
E09000029 0.0425879 0.0045226 0.0459799 0.0059673 0.0157663 0.0069724 0.0035176 0.0339824 0.0466080 0.0415829 0.0005025 0.0298995 0.0246859 0.0138191 0.0140704 0.0002513 0.0020101 0.0120603 0.0000628
E09000030 0.0652078 0.0159285 0.0484445 0.0199271 0.0342956 0.0172467 0.0089199 0.0253977 0.0509271 0.0293084 0.0007250 0.0367343 0.0160603 0.0187187 0.0393927 0.0019554 0.0097109 0.0105457 0.0001977
E09000031 0.0310170 0.0067606 0.0308121 0.0036876 0.0120052 0.0036057 0.0014750 0.0304024 0.0251168 0.0375727 0.0002049 0.0402360 0.0259772 0.0095059 0.0092190 0.0006556 0.0009424 0.0093420 0.0000410
E09000032 0.0381731 0.0072472 0.0655832 0.0074624 0.0180461 0.0091845 0.0041976 0.0402540 0.0311771 0.0355900 0.0003946 0.0311771 0.0202346 0.0123417 0.0119470 0.0008969 0.0018297 0.0158935 0.0000359
E09000033 0.0871492 0.0369990 0.0475242 0.0180250 0.0727706 0.0228702 0.0137720 0.0385974 0.0192951 0.0375056 0.0004424 0.0307623 0.0292781 0.0150850 0.0143358 0.0047239 0.0074997 0.0153490 0.0001570

K-means

kclust = kmeans(la.sic[,-1], centers = 10, nstart = 10) # be aware of the [,-1]
str(kclust)
List of 9
 $ cluster     : int [1:33] 5 8 6 3 8 7 10 1 8 1 ...
 $ centers     : num [1:10, 1:1036] 0.036 0.0443 0.0582 0.0424 0.099 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:10] "1" "2" "3" "4" ...
  .. ..$ : chr [1:1036] "82990 - Other business support service activities n.e.c." "64209 - Activities of other holding companies n.e.c." "70229 - Management consultancy activities other than financial management" "64999 - Financial intermediation not elsewhere classified" ...
 $ totss       : num 0.0945
 $ withinss    : num [1:10] 0.001816 0 0.000671 0 0.002706 ...
 $ tot.withinss: num 0.0244
 $ betweenss   : num 0.0702
 $ size        : int [1:10] 2 1 2 1 2 3 6 7 4 5
 $ iter        : int 2
 $ ifault      : int 0
 - attr(*, "class")= chr "kmeans"

centers is 10 x 1036: 1036 is the number of SIC codes.

Choosing K

  1. Rule of thumb: \(k = \sqrt{n/2}\)

  2. The elbow method

    • Compute k-means clustering for different values of k

    • Calculate \(SS_within\)

    • Plot and spot the loction of a bend

Choosing K

fviz_nbclust(
  la.sic[,-1], 
  kmeans, 
  k.max = 20,
  method = "wss"
)

Hierarchical clustering

  1. Agglomerative clustering (AGNES – AGglomerative NESting)

  2. Divisive hierarchical clustering (DIANA – DIvise ANAlysis)

Dissimilarity (distance) of observations

Hierarchical clustering

# distances between observations
d <- dist(la.sic)

# creates labels for the dendrogam
l <- london.firms %>% distinct(oslaua) %>% arrange(oslaua)

hclust = hclust(d)

plot(hclust, hang=-1, labels=l$oslaua, main='Default from hclust') 
#hang: the fraction of the plot height by which labels should hang below the rest of the plot. A negative value will cause the labels to hang down from 0.

Optimal number of clusters

Optimal number of clusters

Tip

Explore what the 2 cluster solution tells us about London?

Clusters in space

  • Create a SIC frequency table
# This will build an SIC frequency table
london.firms %>% 
  group_by(SICCode.SicText_1) %>% 
  summarise(n=n()) %>% 
  arrange(-n) %>% 
  glimpse()
Rows: 1,037
Columns: 2
$ SICCode.SicText_1 <chr> "82990 - Other business support service activities n~
$ n                 <int> 72756, 58526, 55309, 52058, 46572, 45312, 44270, 439~

Clusters in space

  • Focus on, let’s say “70221 - Financial management”
london.firms.sample <- london.firms %>% 
  #filter(SICCode.SicText_1=="70229 - Management consultancy activities other than financial management") %>%
  #filter(SICCode.SicText_1=="59111 - Motion picture production activities") %>% 
  filter(SICCode.SicText_1=="70221 - Financial management") %>% 
  select(oseast1m, osnrth1m) %>% 
  drop_na() 

Financial management in London

plot(london.firms.sample)

Clusters in space, k-means

fviz_nbclust(
  london.firms.sample, 
  kmeans, 
  k.max = 10,
  method = "wss"
)

Clusters in space, k-means

sp.cluster = kmeans(london.firms.sample, 6) 

plot(london.firms.sample, col = sp.cluster$cluster)
#points(sp.cluster$centers, col = 1:4, pch = 8, cex = 2)

Clusters in space, hdbscan

  1. Transform the space according to the density/sparsity

  2. Build the minimum spanning tree of the distance weighted graph

  3. Construct a cluster hierarchy of connected components

  4. Condense the cluster hierarchy based on minimum cluster size

  5. Extract the stable clusters from the condensed tree.

Resources: SciKit-learn docs and dbscan package

Clusters in space, hdbscan

cl <- hdbscan(london.firms.sample, 
              minPts = 10)         #minimum size of clusters

plot(london.firms.sample, col=cl$cluster+1, pch=20)

References

Balland, Pierre-Alexandre, Tom Broekel, Dario Diodato, Elisa Giuliani, Ricardo Hausmann, Neave O’Clery, and David Rigby. 2022. “The New Paradigm of Economic Complexity.” Research Policy 51 (3): 104450.
Bettencourt, Luı́s MA. 2021. “Introduction to Urban Science: Evidence and Theory of Cities as Complex Systems.”
Boehmke, Brad, and Brandon Greenwell. 2019. Hands-on Machine Learning with r. Chapman; Hall/CRC.
Boschma, Ron, Koen Frenken, H Bathelt, M Feldman, D Kogler, et al. 2012. “Technological Relatedness and Regional Branching.” Beyond Territory. Dynamic Geographies of Knowledge Creation, Diffusion and Innovation 29: 64–68.
Caragliu, Andrea, Laura de Dominicis, and Henri LF de Groot. 2016. “Both Marshall and Jacobs Were Right!” Economic Geography 92 (1): 87–111.
Cohen, Wesley M, and Daniel A Levinthal. 1990. “Absorptive Capacity: A New Perspective on Learning and Innovation.” Administrative Science Quarterly, 128–52.
Frenken, Koen, Frank Van Oort, and Thijs Verburg. 2007. “Related Variety, Unrelated Variety and Regional Economic Growth.” Regional Studies 41 (5): 685–97.
Fujita, Masahisa et al. 1989. “Urban Economic Theory.” Cambridge Books.
Hidalgo, César A. 2021. “Economic Complexity Theory and Applications.” Nature Reviews Physics 3 (2): 92–113.
Saviotti, Pier Paolo, and Koen Frenken. 2008. “Export Variety and the Economic Performance of Countries.” Journal of Evolutionary Economics 18 (2): 201–18.
Yuo, Tony Shun-Te, and Tzuhui Angie Tseng. 2021. “The Environmental Product Variety and Retail Rents on Central Urban Shopping Areas: A Multi-Stage Spatial Data Mining Method.” Environment and Planning B: Urban Analytics and City Science 48 (8): 2167–87.