library(sfnetworks) #tidy geospatial networks
library(sf) #simple features, spatial data
library(tidygraph) #tidyverse's version of igraph
library(tidyverse) #you should know this by now
library(igraph) #as above
library(osmdata) #access OSM data
library(dbscan) #this is the dbscan clustering algorithm
::opts_chunk$set(echo = TRUE) knitr
Street network analysis practical: networks within cities
Street netorks
Until this point the focus has been on networks between cities. Now we are turning the focus on networks within cities and more specifically on street networks. These are spatial networks whose edges and not only their nodes contain spatial information. To summarise, in spatial or, in other words, geospatial networks:
Nodes always contain geographical coordinates.
However, edges can:
either contain geographical information (e.g. road networks),
or not, and instead just link two nodes (e.g. social networks whose node’s geograpical location is known to us)
Street network analysis is not new. For instance, Leonhard Euler proved that the Seven Bridges of Königsberg problem had no solution. This type of problems and analysis – routing – later on became integral part of the standard GIS toolkit.
Why is it important to study street networks:
To understand the form of cities. There are quite a few recent studies (e.g. Boeing 2019, 2017), but this was also part of an older strand of research called Space Syntax (Hillier et al. 1976).
To understand the structure of transportation systems and support transportation planning.
Some key characteristics of street networks:
They might have multiple edges between two nodes.
Street networks are approximately planar (Boeing 2017).
Edges contain geographical information.
Topological measures are important to understand and (maybe compare different) urban street networks.
Try to think what the below topological measures might mean for a street network:
Node density
Node eccentricity
Network diameter
Node degree
Clustering coefficient
Betweenness centrality and its maximum value in a network
Closeness centrality
Key development: OpenStreetMap. Known as the Wikipedia for maps, OSM is a collaborative, citizen science project, which allows individuals to contribute in building openly distributed geographic data (Muki Haklay et al. 2021; Mordechai Haklay and Weber 2008). Through OSM urban analysts have access to street network data from around the world, something which wasn’t the case, let’s say, 20 years ago.
The below are based on the very helpful vignettes of the sfnetworks
and osmdata
packages. Please make sure to revisit these materials on your own time.
In geospatial networks:
nodes always contain geographical coordinates (
sf
objects),edges can:
either contain geographical information (e.g. road networks)
or not, and instead just link two nodes (e.g. geospatial social networks)
A first ‘toy’ example
The below code chunks create and manipulate a toy geospatial network.
= st_point(c(7, 51))
p1 = st_point(c(7, 52))
p2 = st_point(c(8, 52))
p3 = st_point(c(8, 51.5))
p4
= st_sfc(st_linestring(c(p1, p2)))
l1 = st_sfc(st_linestring(c(p1, p4, p3)))
l2 = st_sfc(st_linestring(c(p3, p2)))
l3
= st_as_sf(c(l1, l2, l3), crs = 4326)
edges = st_as_sf(c(st_sfc(p1), st_sfc(p2), st_sfc(p3)), crs = 4326)
nodes
$from = c(1, 1, 3)
edges$to = c(2, 3, 2) edges
As you can see below, net
has a valid spatial network structure
= sfnetwork(nodes, edges) net
Checking if spatial network structure is valid...
Spatial network structure is valid
We can check it’s class:
class(net)
[1] "sfnetwork" "tbl_graph" "igraph"
And check it characteristics.
net
# A sfnetwork with 3 nodes and 3 edges
#
# CRS: EPSG:4326
#
# A directed acyclic simple graph with 1 component with spatially explicit edges
#
# A tibble: 3 × 1
x
<POINT [°]>
1 (7 51)
2 (7 52)
3 (8 52)
#
# A tibble: 3 × 3
from to x
<int> <int> <LINESTRING [°]>
1 1 2 (7 51, 7 52)
2 1 3 (7 51, 8 51.5, 8 52)
3 3 2 (8 52, 7 52)
Just like igraph
we can refer to nodes using the names
we can assign to them.
$name = c("city", "village", "farm")
nodes$from = c("city", "city", "farm")
edges$to = c("village", "farm", "village")
edges
# check the from and to columns edges
Simple feature collection with 3 features and 2 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 7 ymin: 51 xmax: 8 ymax: 52
Geodetic CRS: WGS 84
x from to
1 LINESTRING (7 51, 7 52) city village
2 LINESTRING (7 51, 8 51.5, 8... city farm
3 LINESTRING (8 52, 7 52) farm village
= sfnetwork(nodes, edges, node_key = "name") net
Checking if spatial network structure is valid...
Spatial network structure is valid
net
’s edges contain geographical information. So, let’s create an other_net
whose edges do not contain geographical information. This could have been a geospatial social network, for example. We know where each node is located in space and then the edges illustrate the existence of a connection between two nodes. To do so, we simply set the st_geometry
equal to NULL
. The plots below will make these differences clear.
st_geometry(edges) = NULL
= sfnetwork(nodes, edges, edges_as_lines = TRUE) other_net
Checking if spatial network structure is valid...
Spatial network structure is valid
We can actually plot both example.
plot(net, cex = 2, lwd = 2, main = "Original geometries")
plot(other_net, cex = 2, lwd = 2, main = "Geospatial social network example")
A more ‘real’ street network
Let’s increase the complexity now. We can use the roxel
data from the sfnetworks
package. Roxel is a small town just outside Munster in Germany.
The as_sfnetwork()
function converts the spatial object to a network. So, this is the spatial object and this is how it’s geometry looks like.
roxel
Simple feature collection with 851 features and 2 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 7.522594 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
Geodetic CRS: WGS 84
# A tibble: 851 × 3
name type geometry
* <chr> <fct> <LINESTRING [°]>
1 Havixbecker Strasse residential (7.533722 51.95556, 7.533461 51.95576)
2 Pienersallee secondary (7.532442 51.95422, 7.53236 51.95377, 7.53…
3 Schulte-Bernd-Strasse residential (7.532709 51.95209, 7.532823 51.95239, 7.5…
4 <NA> path (7.540063 51.94468, 7.539696 51.94479, 7.5…
5 Welsingheide residential (7.537673 51.9475, 7.537614 51.94562)
6 <NA> footway (7.543791 51.94733, 7.54369 51.94686, 7.54…
7 <NA> footway (7.54012 51.94478, 7.539931 51.94514)
8 <NA> path (7.53822 51.94546, 7.538131 51.94549, 7.53…
9 <NA> track (7.540063 51.94468, 7.540338 51.94468, 7.5…
10 <NA> track (7.5424 51.94599, 7.54205 51.94629, 7.5419…
# ℹ 841 more rows
plot(st_geometry(roxel))
Now, we can convert it to a geospatial network and plot it.
<- as_sfnetwork(roxel)
net plot(net)
Instead of plot()
we can also use ggplot2
functions. autoplot()
does what it says.
autoplot(net) +
ggtitle("Road network, Münster Roxel")
We have a network, so let’s do some network analysis.
First thing we need to do is to activate the network component we are interested in. In the below example we want to calculate the betweenness centrality of nodes and, therefore, we need to first activate the nodes.
So, the net
object now has a bc
column in the node attributes
= net %>%
net activate("nodes") %>%
mutate(bc = centrality_betweenness()) %>%
print() # print() is equal to just run the object, in this case net.
# A sfnetwork with 701 nodes and 851 edges
#
# CRS: EPSG:4326
#
# A directed multigraph with 14 components with spatially explicit edges
#
# A tibble: 701 × 2
geometry bc
<POINT [°]> <dbl>
1 (7.533722 51.95556) 12936.
2 (7.533461 51.95576) 11824
3 (7.532442 51.95422) 11926.
4 (7.53209 51.95328) 7259.
5 (7.532709 51.95209) 5668
6 (7.532869 51.95257) 2374
# ℹ 695 more rows
#
# A tibble: 851 × 5
from to name type geometry
<int> <int> <chr> <fct> <LINESTRING [°]>
1 1 2 Havixbecker Strasse residential (7.533722 51.95556, 7.533461 51…
2 3 4 Pienersallee secondary (7.532442 51.95422, 7.53236 51.…
3 5 6 Schulte-Bernd-Strasse residential (7.532709 51.95209, 7.532823 51…
# ℹ 848 more rows
We can now create a map of this spatial network, the size and the colour of its nodes are based on their betweenness centrality.
ggplot() +
geom_sf(data = st_as_sf(net, "edges"), col = "grey50") +
geom_sf(data = st_as_sf(net, "nodes"), aes(col = bc), size = 1.5) +
ggtitle("Betweenness centrality")
To better understand the structure of the above ggplot()
look up geom_sf()
and st_as_sf
. If you only run net
and st_as_sf(net, "edges")
you can see what was the type of net
originally and what it was transformed to with the st_as_sf
.
Question: Can you interpret betweenness centrality in this context?
If we want to get rid of geometries, all we need to do is to remove the geometry. As per base R
, setting something equal to NULL
means that we remove it.
%>%
net activate("edges") %>%
st_set_geometry(NULL) %>%
plot(draw_lines = T, main = "Edges without geometries")
%>%
net activate("nodes") %>%
st_set_geometry(NULL) %>%
plot(vertex.color = "black", main = "Nodes without geometries")
A real world example from Open Street Map
Let’s now try to use some real-world data from OSM. The below code extracts the Bristol street data from the OSM API.
To understand the key
and value
options, go to OSM’s wiki page.
Keep an eye on the below. Depending on your internet connections and the OSM’s API response time, it might take a couple of minutes. If you get an error message from running the opq()
try again, it might work the second time as something might have gone wrong the first one. This is not uncommon when we extract data over the internet through an API. Usually such API requests are wrapped up in a if_else()
function to re-try if the first attempt was unsuccessful.
<- opq(bbox = 'Bristol') %>%
bristol_data # getbb ("bristol uk") %>%
# opq () %>%
add_osm_feature(
key = 'highway',
value = c("motorway", "trunk", "primary", "secondary", "tertiary", "residential", "unclassified", "motorway_link", "trunk_link", "primary_link", "secondary_link","tertiary_link", "residential_link")
%>%
) osmdata_sf(quiet = FALSE)
Issuing query to Overpass API ...
Announced endpoint: none
Query complete!
converting OSM data to sf format
# extract the roads
<- st_geometry(bristol_data$osm_lines)
bristol_roads
# build the sfnetwork object
<- as_sfnetwork(bristol_roads, directed = FALSE, length_as_weight = TRUE)
bristol_sfn
ggplot() +
geom_sf(data = st_as_sf(bristol_sfn, "edges"), col = "grey50") +
geom_sf(data = st_as_sf(bristol_sfn, "nodes"), size = .01)
Network data wrangling. We need to:
- Remove multiple edges and loops. AKA simplifying the network.
- Remove nodes that are interior points in a line and define the shape of the line. AKA smoothing the network.
To keep track of the changes, let’s keep a note of the original network size.
# size of original network
gsize(bristol_sfn) # number of edges
[1] 15618
gorder(bristol_sfn) # number of nodes
[1] 21414
Now we can simplify the network.
<- bristol_sfn %>%
bristol_wrangled activate("edges") %>%
filter(!edge_is_multiple()) %>% # Simplify the network
filter(!edge_is_loop()) # Simplify the network
# size of simplified network
gsize(bristol_wrangled) # number of edges
[1] 15444
gorder(bristol_wrangled) # number of nodes
[1] 21414
Remove the interior points…
<- convert(bristol_wrangled, to_spatial_smooth)
bristol_wrangled
# size of simplified network
gsize(bristol_wrangled) # number of edges
[1] 12082
gorder(bristol_wrangled) # number of nodes
[1] 18052
… and plot the network again.
ggplot() +
geom_sf(data = st_as_sf(bristol_wrangled, "edges"), col = "grey50") +
geom_sf(data = st_as_sf(bristol_wrangled, "nodes"), size = .01)
Network measures
Now we can calculate betweenness centrality for Bristol, using the OSM street network we acquired.
<- bristol_wrangled %>%
bristol_wrangled activate("nodes") %>%
mutate(bc = centrality_betweenness())
ggplot() +
geom_sf(data = st_as_sf(bristol_wrangled, "edges"), col = "grey50") +
geom_sf(data = st_as_sf(bristol_wrangled, "nodes"), aes(col = bc,
alpha = bc), # Increase opacity based on betweenness...
size = 1) +
guides(alpha=FALSE) + # ...but exclude this from the legend. Remove this line to see what will happen
ggtitle("Betweenness centrality in Bristol")
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
Question: How can we interpret this map?
Remember the community detections algorithms? We can detect such communities within the street network. These are neighbourhoods of streets which are better connected internally than externally with the whole network. There is a contradiction here, no? Community detection is about nodes, not edges? However, here we are interested in clustering edges. So, we will do a little trick. We will first ‘inverse’ our spatial network, which means that the nodes become edges and vice versa. We will do this with morph(to_linegraph)
. Read the Wikipedia entry of line graps and see the below image for an example. The unmorgh()
brings the changes – that is the new group
column back to the original network, that we now call grouped_net
. For more details, have a look here.
Source: mathworld.wolfram.com
set.seed(123) # The Louvain method is a randomised algorithm,
# which aims to always maximise modularity. If the communities
# are week, every time we run the algorithm the results might
# might be different. To avoid such inconsistency, we can add
# the set.seed() function. See what is does here:
# https://www.rdocumentation.org/packages/simEd/versions/2.0.1/topics/set.seed
= bristol_wrangled %>%
grouped_net morph(to_linegraph) %>%
mutate(group = group_louvain()) %>%
unmorph()
# A new group column has been added with the community membership
grouped_net
# A sfnetwork with 18052 nodes and 12082 edges
#
# CRS: EPSG:4326
#
# An undirected multigraph with 6701 components with spatially explicit edges
#
# A tibble: 18,052 × 2
x bc
<POINT [°]> <dbl>
1 (-2.460661 51.5015) 0
2 (-2.520817 51.51683) 4258.
3 (-2.546005 51.54411) 7039
4 (-2.552245 51.40075) 7039
5 (-2.516866 51.51245) 9353.
6 (-2.51898 51.51272) 27603.
# ℹ 18,046 more rows
#
# A tibble: 12,082 × 5
from to weight x group
<int> <int> <dbl> <LINESTRING [°]> <int>
1 6 7 219. (-2.51898 51.51272, -2.518938 51.51293, -2.518937 5… 14
2 15 16 2713. (-2.573424 51.54525, -2.575283 51.54459, -2.582798 … 21
3 18 19 5.23 (-2.528247 51.50576, -2.528179 51.50578) 14
# ℹ 12,079 more rows
# There are quite a few communities overall
%>%
grouped_net activate("edges") %>%
pull(group) %>%
unique() %>%
length()
[1] 6747
%>%
grouped_net activate("edges") %>%
st_as_sf() %>% # If you remember, we used this above.
transmute(group = as.factor(group)) %>% # We turn community membership to a factor.
filter(group %in% c(1:15)) %>% # Only plotting the first 15 communities.
ggplot() +
geom_sf(aes(col = group), lwd = 4) + # This plots the communities ...
geom_sf(data = st_as_sf(bristol_wrangled, "edges"),# ... and this all the network.
col = "grey", lwd = 0.5, alpha = 0.4) +
theme(legend.position="none") + # No legend please.
ggtitle("Bristol's street communities")
Task: Read the routing vigneette of `sfnetworks and apply these techniques in the Bristol street network.