Creating a Geodemographic Classification Using K-means Clustering in R

Guy Lansley and James Cheshire


Geodemographic classifications group neighbourhoods (or sometimes even indiviudal households) into types of similar characteristics based on a range of variables. They are a useful means on segmenting the population into distinctive groups in order to effectively channel resources. Such classifications have been effective deductive tools for marketing, retail and service planning industries due to the assumed association between geodemographics and behaviour. For instance, typically a classification at the broadest level may distinguish cosmopolitan neighbourhoods with high proportions of young and newly qualified workers from suburban neighbourhoods with high proportions of settled families. Such classifications work because people of like- minded characteristics tend to cluster within cities. Whilst most geodemographic products are built within the commercial sector and sold by vendors, open source alternatives are available.

The following tutorial will provide you with the basic skills to build your own geodemographic classification using R. All data and resources for this exercise are freely available. Those that are unfamiliar with R may find it useful to go through the Introduction to Spatial Data Analysis and Visualisation in R CDRC tutorial series first. The tutorial is free, but users will need to register on this website to access the materials.