Data on the volume of local internet content (LIC). They are based on the UK Web Archive (https://www.webarchive.org.uk/), which draws data from the Internet Archive, and measure the volume of archived online content of local interest at the MSOA/IZ level during the 2002-2012 period.
Please see the PDF data item listed here, for full details of the dataset, and references. The following is a summary of the PDF:
Each entry of the JISC UK Web Domain Dataset (https://data.webarchive.org.uk/opendata/ukwa.ds.2/) (Jackson, 2017), which is a subset of the Internet Archive and curated by the British Library and includes all the archived webpages under the .uk top level domain, contains a timestamp, the URL of the archived website as well as the British postcode found on each site.
We can use this dataset to derive a measure for how much local internet content (hereafter LIC, Tranos & Stich 2019) there exists across the UK. However, the dataset includes websites with differing geographic reach; some websites may refer to single postcode, while others may refer to several postcodes all over the UK. As described in Tranos and Stich, 2019, this poses a problem when trying to ascertain whether localised internet content is a driver of online behaviour.
We thus need a way to discount websites that have less of a local focus. The underlying idea is that websites that have a high geographic dispersion are less “local”. To compute the geographic dispersion of a websites’ set of postcodes p we calculate the Radius of Gyration (RG) of p in kilometres. A website with a high RG will be of national interest, while a website with a low RG will have a very local geographic presence.
As local geographical units we utilise the Middle Layer Super Output Areas (MSOA) for England and the Intermediate Zones (IZ) for Scotland. For each MSOA/IZ with a set W of archived websites we calculate yearly measures of the volume of LIC.
Tranos, E. & Stich, C. (2019). Individual internet usage and the availability of online content of local interest: a multilevel approach. CEUS, in print.
Created by Emmanouil Tranos and Christoph Stich.
JISC UK Web Domain Dataset
Data and Resources
- Calculating Local Internet Content (Python Function / Jupyter Notebook)bin
This notebook details how to calculate the Local Internet Content (LIC) for...
|Release Date|| |
|Spatial / Geographical Coverage Location|| |
|Contact Name|| |