You are here

Ethnicity Estimator software

Ethnicity Estimator is an online service which allows users to produce an estimated ethnicity distribution of a set of names supplied to it, based on the standard UK ONS ethnicity category groups. Upon supplying a CSV of names, it will return an indicative population count, split by the categories. The online service is secure and supplied names lists are automatically discarded after the categorisation is complete.

The Ethnicity Estimator (EE) classifier is based on research which uses names data assembled by the Consumer Data Research Centre (CDRC). The data are taken from consumer sources and from the Office for National Statistics (ONS), which securely host data from England & Wales.

The research enables estimates of the ethnic distribution from datasets which contain names, using the best methodology. Users can now apply to access the Ethnicity Estimator software online. This software provides aggregate classifications reporting on estimated population for each of the standard ONS ethnicity groups.

Accepted applications will be for users who utilise the software for the public good, and applicants can be drawn from the academia, government or industry sectors. Please read our full Terms and Conditions (see document below) prior to making an application. Please note that the application review process takes a number of weeks. Once your application has been approved, then a new link on this page will be available to you when you are logged in.

The category groups are:

  • ABD: Asian/Asian British - Bangladeshi
  • ACN: Asian/Asian British - Chinese
  • AIN: Asian/Asian British - Indian
  • APK: Asian/Asian British - Pakistani
  • AAO: Asian/Asian British - Any Other
  • BAF: Black/Black British - African
  • BCA: Black/Black British - Caribbean
  • WBR: White - English/Welsh/Scottish/Northern Irish/British
  • WBR: White - Irish
  • WAO: White - Any Other (including Gypsy or Irish Traveller)
  • OXX: Any Other Ethnic Group (including Arab)
  • Unclassified: Names that could not be classified into one of the above.

A minimum of 100 distinct (unique) names must be supplied on your input file. The application's server will time-out if more than approximately 8000 names (including duplicate names) are supplied, so if your names list is longer than this, you will need to prepare multiple input files and run each one in turn.

Results Perturbation (Noise)

Due to a stipulation from one of the upstream data suppliers, the software adds some "noise" to the results, perturbating the count values by a small amount, mimicing the inherent uncertainty and inaccuracy in predicting an ethnicity solely from a name. This does mean that running the software repeatedly on the same set of names will produce slightly different numbers each time. A normal distribution is applied to the size of the perturbation, for each name. The Coefficient of Variation (CV) of the "noise" perturbation diminishes for larger datasets. Only rarely will the perturbation significantly change the result.

Here, two names lists - a small one and a large one, are each run 5 times, and the average and standard deviation is calculated. For low count results (<10), which are masked with an asterisk, a result of 3 is assumed for the SUM, but no result is assumed for the average and SD calculation. The unclassified count is not subject to perturbation.

152 names Run 1 Run 2 Run 3 Run 4 Run 5 Average SD
WBR 47.7 51.6 49.6 49.4 49.6 49.58 1.4
WIR * * * * *
WAO 27.3 25.2 26.3 25.9 28.2 26.58 1.2
AIN 12.8 10.6 12.9 12.4 * 12.18 1.1
APK * * * * *
ABD * * * * *
ACN 28.4 28.2 24.1 26 27.8 26.9 1.8
AAO * * * * *
BAF * * * * *
BCA * * * * *
OXX * 12.7 12.6 12.6 10.5 12.1 1.1
unclassified 11 11 11 11 11 11 0
SUM 148.2 157.3 154.5 155.3 148.1 152.68 4.3
7999 names Run 1 Run 2 Run 3 Run 4 Run 5 Average SD
WBR 4668.3 4717.3 4702.7 4702.8 4699.8 4698.18 18.0
WIR 227.5 245.6 218.7 225.3 225 228.42 10.1
WAO 853.2 815.1 829.8 805.2 821.9 825.04 18.2
AIN 817.9 823.7 818.4 824.6 822.8 821.48 3.1
APK 125.8 125.7 125.2 126 138.7 128.28 5.8
ABD 17.1 25.5 31.8 30.2 27.1 26.34 5.7
ACN 347.6 341.9 350.7 360.8 354.1 351.02 7.1
AAO 58.3 67.2 64 63.9 73.7 65.42 5.6
BAF 95.2 98.4 105.1 102.4 114.1 103.04 7.2
BCA 114.5 128 118.6 119.4 104.4 116.98 8.6
OXX 528.1 553.5 547.7 553.9 526 541.84 13.7
unclassified 104 104 104 104 104 104 0
SUM 7957.5 8045.9 8016.7 8018.5 8011.6 8010.04 32.3
Controller: 
University College London (UCL)
Additional Info: 
FieldValue

Source

ONS

FieldValue
Modified
2024-09-06
Release Date
2019-11-18
Spatial / Geographical Coverage Location
England and Wales
Granularity
Ethnic Group
Author
ONS, CDRC
Contact Name
Oliver O'Brien
Contact Email
POLYGON ((-6.7036926746 49.7133142846, 1.9288146496 49.7133142846, 1.9288146496 55.9920568951, -6.7036926746 55.9920568951, -6.7036926746 49.7133142846))
Other (Not Open)

Data Extent

Apply for the data:

To apply for the data, please login or register.

License

Other (Not Open)