In this hands-on tutorial, we show how to use Google’s AlphaEarth Foundations embeddings to analyse buildings at scale, from exploratory clustering through to predictive modelling. Using the Liverpool City Region as a case study, participants will learn how to extract building-level embeddings, uncover meaningful structure in high-dimensional data, link to administrative datasets, and evaluate the strengths and limitations of embedding-based approaches for real-world urban analysis. By the end of the tutorial, you will have a complete, reproducible workflow for working with AlphaEarth embeddings in Python, from raw access to advanced analytics.
All data required for the exercises can be downloaded from this page and be placed in a data folder in your working directory.
The key steps covered in the notebook are:
Data Access and Extraction:
- Describe satellite embeddings as numerical “fingerprints” of place.
- Understand the structure of Google’s AlphaEarth Foundations 64-dimensional embeddings at 10m resolution.
- Set up, authenticate, and initialise the Google Earth Engine API in Python.
- Access AlphaEarth embedding collections (2017–2024) for large regions and specific locations.
- Extract and consolidate sampled pixel-level embeddings into efficient Parquet formats using DuckDB.
Setup and Pixel-Level Analysis:
- Define a study region (e.g. Liverpool City Region) and quantify its embedding footprint.
- Explore the scale and structure of 64-dimensional embeddings using sampled pixels.
- Understand when to work server-side in Earth Engine versus downloading local subsets.
Building Cluster Analysis:
- Append embeddings to building geometries (e.g. TOID-based locations) using batch extraction.
- Fit K-means clustering to identify building typologies from building-level embeddings.
- Export clustered outputs (GeoPackage, Parquet) for mapping and further spatial analysis.
Explore the Structure of the Embeddings and Clusters using UMAP:
- Use UMAP to project 64-dimensional embeddings into 2D for exploratory visualisation.
- Assess cluster separation, stability, and overlap in embedding space.
- Summarise cluster characteristics using counts, proportions, and centroids.
Matching Building Characteristics to Describe the Clusters:
- Integrate embeddings with administrative sources such as Energy Performance Certificates.
- Derive and recode key attributes (e.g. built form, property type, age bands).
- Construct propensity indices to understand how different building attributes concentrate within clusters.
The Descriptive and Predictive Potential of Embeddings:
- Compute cosine similarity measures to identify buildings most similar to chosen reference profiles (e.g. pre-1930 stock).
- Build and evaluate a Random Forest model to predict construction age bands from embeddings.
- Interpret precision, recall, F1, confusion matrices, and feature importance for multi-class prediction.
- Critically assess what embeddings can and cannot reliably predict at 10m resolution.
Visualisation and Communication:
- Produce publication-ready plots, clustergrams, UMAP visualisations, and heatmaps.
- Create interactive maps (e.g. Kepler.gl, GIS-ready GeoPackages) to explore spatial patterns.
- Develop transparent, reproducible workflows suitable for policy, research, and applied urban analytics.