Ethnicity Estimator

Ethnicity Estimator is an online service which allows users to produce an estimated ethnicity distribution of a set of names supplied to it, based on the standard UK ONS ethnicity category groups. Upon supplying a CSV of names, it will return an indicative population count, split by the categories. The online service is secure and supplied names lists are automatically discarded after the categorisation is complete.

The Ethnicity Estimator (EE) classifier is based on research which uses names data assembled by GeoDS. The data are taken from consumer sources and from the Office for National Statistics (ONS), which securely host data from England & Wales.

The research enables estimates of the ethnic distribution from datasets which contain names, using the best methodology. Users can now apply to access the Ethnicity Estimator software online. This software provides aggregate classifications reporting on estimated population for each of the standard ONS ethnicity groups.

Accepted applications will be for users who utilise the software for the public good, and applicants can be drawn from the academia, government or industry sectors. Please read our full Terms and Conditions (see document below) prior to making an application. Please note that the application review process takes a number of weeks. Once your application has been approved, then a new link on this page will be available to you when you are logged in.

The category groups are:

  • ABD: Asian/Asian British - Bangladeshi
  • ACN: Asian/Asian British - Chinese
  • AIN: Asian/Asian British - Indian
  • APK: Asian/Asian British - Pakistani
  • AAO: Asian/Asian British - Any Other
  • BAF: Black/Black British - African
  • BCA: Black/Black British - Caribbean
  • WBR: White - English/Welsh/Scottish/Northern Irish/British
  • WBR: White - Irish
  • WAO: White - Any Other (including Gypsy or Irish Traveller)
  • OXX: Any Other Ethnic Group (including Arab)
  • Unclassified: Names that could not be classified into one of the above.

Content

Access to an online tool.

A minimum of 100 distinct (unique) names must be supplied on your input file. The application's server will time-out if more than approximately 8000 names (including duplicate names) are supplied, so if your names list is longer than this, you will need to prepare multiple input files and run each one in turn. Input files should be less than 10MB.

Quality, Representation and Bias

Due to a stipulation from one of the upstream data suppliers, the software adds some "noise" to the results, perturbating the count values by a small amount, mimicing the inherent uncertainty and inaccuracy in predicting an ethnicity solely from a name. This does mean that running the software repeatedly on the same set of names will produce slightly different numbers each time. A normal distribution is applied to the size of the perturbation, for each name. The Coefficient of Variation (CV) of the "noise" perturbation diminishes for larger datasets. Only rarely will the perturbation significantly change the result.

In the "Perturbation Examples" technical report below, the results of runs of two names lists - a small one and a large one, are show. Each are run 5 times, and the average and standard deviation is calculated. For low count results (less than 10), which are masked with an asterisk, a result of 3 is assumed for the SUM, but no result is assumed for the average and SD calculation. The unclassified count is not subject to perturbation.

Safeguarded

Data and Resources

This dataset is categorised as Safeguarded and therefore access to the underlying data is only available upon application. You can view metadata here to determine whether the data will be of use to you. You can apply for access to it by requesting it here.

Please log in first if you wish to request data.

 

Additional Info

Field Value
Source Office for National Statistics
Author Longley, Paul
Maintainer Oliver O'Brien
Last Updated May 14, 2025, 11:58 (UTC)
Created May 6, 2025, 12:41 (UTC)
Controller UCL
Frequency Snapshot
Granularity Ethnic Group
Spatial Coverage England and Wales
Temporal Coverage April 2011