cdrcR
is an R wrapper developed to enable access to the
CDRC API endpoints and to retrieve open and freely available CDRC data
programmatically. The package is designed to have one main function –
getCDRC
– which allows you to get data from all the
available CDRC API endpoints. You can access a list of these available
endpoints and their metadata by running listCDRC()
. This
list will provide you with a dataset identifier – the
dataCode – which you will need to use to request the
correct endpoint.
You can also run ?getCDRC()
to access the function’s full
documentation.
You can install the cdrcR
package from CRAN, or the
development version from Github using devtools.
The cdrcR
package relies on an external service, so
please do periodically check the github repository for any updates on
the versioning.
# Install from Github
install.packages("devtools")
devtools::install_github("aelissa/cdrcR")
# OR install from CRAN
install.packages("cdrcR")
To use to CDRC APIs, you will need to register to the CDRC API HERE.
Please be aware that the CDRC API registration is separate from the data.cdrc.ac.uk account. If you already have an account there, you will still need to register at the above link. Below is a screenshot of the correct registration website.
First of all, you need to load the library.
library(cdrcR)
Then, to get started you need to log-in with the username and
password that you used (or just created) when registering with the CDRC
API. Please note that you will need to use loginCDRC()
each
time you start working with the API again.
Your CDRC login details for data.cdrc.ac.uk will not work here, so be sure to use your CDRC API account details.
loginCDRC(username="your-username",password="your-password")
You can now list the open access datasets that are available via the API, alongside the relative dataCode which identifies the API endpoint. You will need this dataCode to specify the dataset that you wish to get later on.
listCDRC()
This function will result in a data frame that should look something
like this:
Title | DataCode | dataSetURL | GeographicalCoverage | GeographyLevel |
---|---|---|---|---|
Access to Healthy Assets & Hazards (AHAH) 2019 | AHAHInputs, AHAHOverallIndexDomain | https://data.cdrc.ac.uk/dataset/access-healthy-assets-hazards-ahah | Great Britain | LSOA |
Classification of Workplace Zones (COWZ) 2011 | COWZUK2011 | https://data.cdrc.ac.uk/dataset/classification-workplace-zones-cowz | United Kingdom | WZ |
Index of Multiple Deprivation (IMD) 2019 | IMD2019 | https://data.cdrc.ac.uk/dataset/index-multiple-deprivation-imd | United Kingdom | LSOA |
Internet User Classification (IUC) 2018 | IUC2018 | https://data.cdrc.ac.uk/dataset/internet-user-classification | Great Britain | LSOA |
London Output Area Classification (OAC) 2011 | LOACClassification2011, LOACInputData2011 | https://data.cdrc.ac.uk/dataset/london-oac-2011 | London | OA |
London Workplace Zone Classification 2017 | LWZCClassification2017, LOACInputData2011 | <https://data.cdrc.ac.uk/dataset/london-workplace-zone-classification > | GreaterLondon | WZ |
Classification of Multidimensional Open Data of Urban Morphology (MODUM) 2016 | MODUMClassificationEW2016 | https://data.cdrc.ac.uk/dataset/classification-multidimensional-open-data-urban-morphology-modum | England | OA |
Once you have decided which dataset you would like to access, pick
its relative DataCode
. This will be the input for the
dataCode parameter in the getCDRC
function.
The getCDRC()
function is the function that obtains the
CDRC data, and it requires 4 parameters in order to fulfill your
request. These are:
DataCode
- The API identifier for the specific dataset,
available from the table above.
geography
- The geographical level in which to retrieve the
data. Choose from c(postcode, MSOA, LSOA, LAD, LADcode)
.
geographyCode
- A character-vector of one or more
postcodes, LSOA codes, MSOA codes, LAD codes or LAD names.
boundaries
- f FALSE (the default), returns a data frame of
the desired data. if TRUE, the Open Geography Portal API is used to
return an sf with the ‘geometry’ column.
Please be aware that not all API endpoints enable query for the
geography in which it was built (you can find the original geography
level for each dataset with listCDRC()
via the
GeographyLevel attribute). The API endpoints can be queried for several
geographies; postcodes, LSOAs, MSOAs, LAD codes and LAD names. This
means that the datasets built at OA and WZ cannot be
retrieved with this geography in the geography
parameter of
the getCDRC()
function. Rather, you must specify one of the
specified geographies (postcodes, LSOAs, MSOAs, LAD codes/LAD names),
for which the areas that overlap the geography will be returned. For
example, this means that the Workplace Zone Classifications cannot be
specified with WZ. However, if you wish to retrieve the WZC for
a specific area, you can do so like this.
wz <- getCDRC("COWZUK2011", geography = "LADname", geographyCode = "Manchester")
This will retrieve all of the workplace zones in Manchester. Or, you can state a specific postcode, or LSOA, and the workplace zone that overlaps that specific postcode or LSOA will be returned.
getCDRC("COWZUK2011", geography = "postcode", geographyCode = "M139PR")
This returns the workplace zone for the University of
Manchester.
The Access to Healthy Assets and Hazards dataset (AHAH) is a
multidimensional (composite) index developed by the CDRC to measure how
‘healthy’ neighbourhoods are in Great Britain. It combines indicators
under 4 different domains of accessibility, including:
In this example, to highlight the usability of getCDRC()
we will access the overall AHAH domain index (via the
dataCode AHAHOverallIndexDomain), for several
postcodes: L13AY, L82TJ, L83UL.
If you recall, the AHAH is a composite index, and so the overall score is a combination of all four domains. Also, although we are interested in postcodes to rank by their level of access to healthy assets and hazards, the data is at LSOA level. Therefore, the LSOAs that overlap the requested postcodes will be returned.
# Login
loginCDRC(username="your-username",password="your-password")
# Check the data and their relevant dataCode
listCDRC()
# Get the AHAH index for the postcodes
ahah <- getCDRC("AHAHOverallIndexDomain",
geography = "postcode",
geographyCode = c("L13AY","L82TJ","L83UL"))
# Inspect ahah to understand what was returned
dim(ahah)
head(ahah)
names(ahah)
The ahah
dataframe created above consists of the
following variables:
Variable | Description |
---|---|
lsoa11 | Lower Super Output Area code (2011) |
r_rank | Retail domain - Ranks |
h_rank | Health domain - Ranks |
g_rank | Blue/Green space domain - Ranks |
e_rank | Air Quality domain - Ranks |
r_exp | Retail domain - Value (after exponential transformation) |
h_exp | Health domain - Value (after exponential transformation) |
g_exp | Blue/Green space domain - Value (after exponential transformation) |
e_exp | Air Quality domain - Value (after exponential transformation) |
ahah | Access to Healthy Assets and Hazards - Value |
r_ahah | Access to Healthy Assets and Hazards - Ranks |
d_ahah | Access to Healthy Assets and Hazards - Deciles |
r_dec | Retail domain - Deciles |
h_dec | Health domain - Deciles |
g_dec | Blue/Green space domain - Deciles |
e_dec | Air Quality domain - Deciles |
# Here we will rank the postcodes by the AHAH index from the best performing to the worst performing, using the ahah variable - the overall Access to Healthy Assets and Hazards score.
ahah[order(ahah$ahah),c("postCode","ahah")]
# postCode ahah
#3 L8 3UL 20.01734
#2 L8 2TJ 23.04482
#1 L1 3AY 45.91745
In this example, we can see that L8 3UL is the most healthy in terms of access to healthy goods and assets, with an overall AHAH score of 20.0734, and L1 3AY is the least healthy, with an overall AHAH score of 45.9175. To gain a better understanding of these scores, we can visualise the breakdown of them by their domain with the following code.
# Load libraries
# install.packages(c("ggplot2","tidyr"))
library(ggplot2)
library(tidyr)
# Transform data into long format
ahahLong <- pivot_longer(data = ahah,
# The columns we want to make long - the domain scores
cols = c("rExp","hExp","gExp","eExp"),
# Variable for the variable names to go into
names_to = "domains",
# Variable for the variable values to go into
values_to = "scores")
# Inspect the new dataframe
names(ahahLong)
head(ahahLong[,c(1,14,15)])
## Create graph
# Global mappings
ggplot(ahahLong, aes(domains, scores, fill = domains))+
# Create the bar charts
geom_col(show.legend = FALSE)+
# For each postcode
facet_wrap(vars(postCode))
You will now have something that looks like this! As we can see, the postcode L1 3AY has a score of over 80 for rExp. This variable is the score given for the retail domain, which suggests that poor access to retail is a large factor in its overall score. It also has larger scores in eExp and gExp, which are the air quality, and blue and greenspace domains respectively. However, we can also see that this postcode has a very low hExp score, suggesting very good, and greater accessibility to health services than the other two postcodes.
Now, we will access the same data, but for the City of Doncaster in
the Yorkshire and The Humber region. Here, we will need to specify the
Local Authority District name (LADname
): Doncaster. We will
also want to return the boundaries so that we can map the data.
# Check the data and their relevant dataCode
listCDRC()
# Get the AHAH index for Doncaster
ahahDN <- getCDRC("AHAHOverallIndexDomain",
geography = "LADname",
geographyCode = "Doncaster",
boundaries = TRUE)
## Inspect the output
# Check the names of the variables
names(ahahDN)
# Check out the first 6 observations
head(ahahDN)
## - OK, let's map the deciles of the overall ahah score - the dAhah variable!
## Map the deciles of AHAH througout Doncaster
# Load in the tmap package
# install.packages("tmap", dependencies = TRUE)
library(tmap)
# Create the map
tm_shape(ahahDN) +
# Fill it with the AHAH deciles, giving appropriate title
tm_fill("dAhah", style = "cat", title = "Decile of AHAH") +
# Specify legend outside of map window, with text size
tm_layout(legend.outside = TRUE,
legend.text.size = 1)
You should now have an output that looks something like this! We can see some clear patterns of access to healthy assets and hazards in Doncaster. Doncaster’s urban centre, towards the center of the map, has clusters of areas with poor access to healthy assets and hazards. As we move further from the urban centre, access improves, until we reach the rural areas towards the periphery of the borough, where access to healthy assets and hazards is again typically poor. Though there are exceptions to this, with pockets of good access towards the rural north and northwest, and southwest.
The Internet User Classification (IUC) 2018 is a bespoke classification that describes how people living in Great Britain interact with the internet. It is developed at the LSOA and Data Zone (DZ) level, and creates clusters of internet use and engagement. It is an update of the 2014 Internet User Classification. You can view the user guide with methodology and the IUC profiles HERE.
There are a number of metadata variables that can be ignored with this return (id, createdDate, modifiedDate, isDeleted, rowVersion, and lastUpdatedBy). Whilst the grpCD and grpLabel are the variables of interest.
Variable | Description |
---|---|
LSOA11CD | LSOA |
grpCD | The group code (1-10) |
grpLabel | The IUC group label |
The grpCD
and the grpLabel
are the
variables of interest here. The groups are as follows:
Group | Label | Description |
---|---|---|
1 | e-cultural Creators | High levels of engagement, particularly social media, streaming, and gaming. New but active users. |
2 | e-Professionals | High levels of engagement, fairly young urban professionals, who are experienced, daily users. |
3 | e-Veterans | High levels of engagement, affluent families in low density suburbs, middle aged, qualified professionals, who are frequent and experienced users. |
4 | Youthful Urban Fringe | Average levels of engagement, young and ethnic minorities, typically students and young urbanites at the edges of deprived communities. |
5 | e-Rational Utilitarians | High demand constrained by poor infrastructure, engagement consists of e-commerce from middle aged or older residents, with personal computers at home. |
6 | e-Mainstream | Average levels of engagement from wide range of social echelons, located on the edge of urban areas or in transitional neighbourhoods. |
7 | Passive and Uncommited Users | Limited or low levels of engagement, typically located outside of city centres and close to the rural-urban fringe. Individuals are rarely online, with weekly access or less. |
8 | Digital Seniors | White British, wealthy and retired in semi-rural or coastal regions, infrequent but adept users (less so for social media, streaming, and gaming) |
9 | Settled Offline Communities | Very limited engagement with the internet, accessing rarely or not at all. Most are elderly and tend to reside in semi-rural areas. Any online behaviour is via computers and mostly information seeking. |
10 | e-Withdrawn | The least engaged group, typically located in deprived urban regions, in areas with less affluent White British or areas of high ethnic diversity, and the greatest levels of unemployment. Potentially opt out of engagement for economic reasons. |
For this example, we will get the Internet User Classification for LSOAs across Liverpool. Again, we want to return the geographies so that we can map the results.
# Check dataCode
listCDRC()
# Get Liverpool LSOAs
liverpool <- sf::st_as_sf(liverpool)
# Get the IUC data with geographical boundaries, using the Liverpool$LSOA11CD as the geographyCode input
iuc <-getCDRC("IUC2018",
geography = "LSOA",
geographyCode = liverpool$LSOA11CD,
boundaries = TRUE)
## Inspect the output
# Check the names
names(iuc)
# Check the first 6 observations
head(iuc)
## OK - lets map the group names - the grpLabel variable!
# Map the IUC throughout Liverpool
tm_shape(iuc) +
# Fill with the group labels, and give appropriate title
tm_fill("grpLabel", style = "cat", title = "IUC Groups") +
# Specify legend outside of mapa window and legend text size
tm_layout(legend.outside = TRUE,
legend.text.size = 1)
You should now have an output that looks something like this! We can
see that the north of Liverpool is dominated with clusters of
e-Withdrawn and Passive and Uncommitted Users, some of the groups with
the least interaction with the internet. The e-Professional groups are
clustered towards the center south of Liverpool, particularly near the
Docks, further eastwards around the universities, and at St Michael’s.
The e-Cultural Creators, the group that interact the most, particularly
with social media, are located within the student neighbourhoods in
Liverpool.
The Index of Multiple Deprivation (IMD) is a composite indicator that measures relative levels of deprivation via 39 indicators, separated into 7 domains of deprivation. These are:
These 7 distinct domains of deprivation are combined and weighted to
calculate the overall measure of multiple deprivation experienced by
people living in a neighbourhood, measured by LSOA. The IMD has a number
of variables, described in the table below.
Variable | Description |
---|---|
ladCode | The code of the Local Authority |
LSOA11CD | LSOA code |
LSOA11NM | LSOA name |
imd2010Adjusted | IMD score |
nationalQuintile1 | Quntile of deprivation from most deprived (1) to least deprived (5) |
nationalDecile2 | Decile of deprivation from most deprived (1) to least deprived (10) |
imdRank- | The ranking of the LSOA from most deprived (1) to least deprived (32,844) |
By now, you should be acquainted with the process of accessing CDRC
endpoints via the getCDRC()
function. If you need more
examples, feel free to follow the below exercises. If not, thank you for
following this tutorial and good luck with your analyses!
Additional Exercises:
Access the classification of Workplace Zones, and retrieve data for the county that you live in, using the LADnames to retrieve the Local Authority Districts within it. Acquaint yourself with the variables (check the metadata), and try to find the following:
Access the IMD for a city that you are interested in. Using the deciles of deprivation, find the following:
imdRank
variable, find how these areas
rank in relation to the country.
Access the AHAHInputs for your postcode. Acquaint yourself with the these variables (check the metadata) and consider how all of these factors gave your neighbourhood its overall score.
NB: You will need to do loginCDRC()
each time you
start working with the API.