This is part 1 of the 2 part course from CDRC on the Internet User Classification (IUC) and K-Means Clustering. The video in this part introduces the IUC data set, and the practical session shows you how to work with the IUC dataset in R. If you are completely new to RStudio, please check out our Short Course on Using R as a GIS.
After completing this material, you will:
Our first step is to download the IUC dataset:
Open a web browser and go to https://data.cdrc.ac.uk
Register if you need to, or if you are already registered, make sure you are logged in.
Search for IUC
Open the Internet User Classification page.
iuc2018.csv
file to your working directory.You can download the shapefile with the data already joined to the LSOA boundaries, but this is the national data set and is quite large (75MB). R will work with this, but might be a bit slow. The steps below will only get the shapefile for Liverpool, which will be a much smaller file.
Open the file in Excel - what data do we have?
Check out the User Guide if you want to.
Start a new Script in RStudio.
Set your working directory.
Use this code to read in the file:
iuc <- read.csv("iuc2018.csv")
head()
to check what the data are:head(iuc)
SHP_ID LSOA11_CD LSOA11_NM GRP_CD GRP_LABEL
1 1 E01020179 South Hams 012C 5 e-Rational Utilitarians
2 2 E01033289 Cornwall 007E 9 Settled Offline Communities
3 3 W01000189 Conwy 015F 5 e-Rational Utilitarians
4 4 W01001022 Bridgend 014B 7 Passive and Uncommitted Users
5 5 W01000532 Ceredigion 007B 9 Settled Offline Communities
6 6 E01018888 Cornwall 071G 9 Settled Offline Communities
Is this the data we expect to see?
Use View()
to look at the data.
Use str()
to see whether they are character or numeric variables.
str(iuc)
'data.frame': 41729 obs. of 5 variables:
$ SHP_ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ LSOA11_CD: chr "E01020179" "E01033289" "W01000189" "W01001022" ...
$ LSOA11_NM: chr "South Hams 012C" "Cornwall 007E" "Conwy 015F" "Bridgend 014B" ...
$ GRP_CD : int 5 9 5 7 9 9 9 9 5 6 ...
$ GRP_LABEL: chr "e-Rational Utilitarians" "Settled Offline Communities" "e-Rational Utilitarians" "Passive and Uncommitted Users" ...
There will be character chr
, integer int
and numeric num
values in this data frame. Make sure you can identify which is which, and that you know what the differences are.
hist()
).To create any maps, we need some spatial data.
BoundaryData.zip
to download the files.Extract the files, and move all the files starting with the name england_lsoa_2011
to your working folder.
We will also need some spatial libraries:
#load libraries
library(sf)
library(tmap)
Read in the spatial data we downloaded from Edina:
#read in shapefile
LSOA <- st_read("england_lsoa_2011.shp")
head()
, class()
and str()
.Let’s do a quick map:
qtm(LSOA)
Next step is to join the attribute data (iuc
) to the spatial data (LSOA
).
Use head()
to check which columns we are using for the join:
#check which columns we are joining
head(iuc)
head(LSOA)
#join attribute data to LSOA
LSOA <- merge(LSOA, iuc, by.x="code", by.y="LSOA11_CD")
#check output
head(LSOA)
Simple feature collection with 6 features and 7 fields
geometry type: POLYGON
dimension: XY
bbox: xmin: 334715 ymin: 385417 xmax: 339020.8 ymax: 390548
projected CRS: OSGB 1936 / British National Grid
code label name SHP_ID LSOA11_NM
1 E01006512 E08000012E02001377E01006512 Liverpool 031A 26586 Liverpool 031A
2 E01006513 E08000012E02006932E01006513 Liverpool 060A 24660 Liverpool 060A
3 E01006514 E08000012E02001383E01006514 Liverpool 037A 27675 Liverpool 037A
4 E01006515 E08000012E02001383E01006515 Liverpool 037B 26856 Liverpool 037B
5 E01006518 E08000012E02001390E01006518 Liverpool 044A 28180 Liverpool 044A
6 E01006519 E08000012E02001402E01006519 Liverpool 056A 27474 Liverpool 056A
GRP_CD GRP_LABEL geometry
1 1 e-Cultural Creators POLYGON ((336203 390010, 33...
2 1 e-Cultural Creators POLYGON ((335402.8 390317.5...
3 1 e-Cultural Creators POLYGON ((335651.3 389926.8...
4 10 e-Withdrawn POLYGON ((335186 389604, 33...
5 10 e-Withdrawn POLYGON ((335537.2 389034.5...
6 3 e-Veterans POLYGON ((338014.6 386447.2...
Finally, we can plot the maps quickly using qtm()
from the tmap
library:
#ahah index
qtm(LSOA, "GRP_CD")
This works well. However we don’t get many options with this. We can use a different function tm_shape()
, which will give us more options.
tm_shape(LSOA) +
tm_polygons("GRP_CD")
tm_shape(LSOA) +
tm_polygons("GRP_CD", palette = "Set3", n = 10) +
tm_layout(legend.title.size = 0.8)
This allows us to change the title, colours and legend title size. We are now using a qualitative palette as these data have no inherent order (i.e. group 1 is not more or less than group 2).
We can customise the colours. Try running this code to see the different palette options:
library(RColorBrewer)
display.brewer.all()