STAC: Organizing and Accessing Geospatial Data in the Cloud
The SpatioTemporal Asset Catalog (STAC) family of specifications standardizes the structure and querying of geospatial asset metadata. A "spatiotemporal asset" refers to any file representing Earth-related information at specific locations and times. Originally focused on satellite imagery, STAC now includes a wide range of data sources and types, such as aircraft, drones, hyperspectral optical, synthetic aperture radar (SAR), video, point clouds, lidar, digital elevation models (DEM), vector, machine learning labels, and composites like NDVI and mosaics.
STAC is designed with a minimal core and flexible extensions to support diverse use cases. It has matured over the years and is widely deployed in production. For geospatial data providers, STAC offers a standard format and API, eliminating the need to create proprietary systems. For consumers, it allows the use of existing tools and libraries to access metadata, avoiding the need to develop custom code for different data providers' formats and APIs.
The STAC specifications define interconnected JSON object types that support a HATEOAS-style interface and a RESTful API for browsing and searching metadata. Key components include the Item, Catalog, and Collection specifications, forming the minimal core. These can be implemented statically as hyperlinked URLs, allowing data to be published as browsable sets of files. For more advanced querying capabilities, such as spatial or temporal searches, the STAC API specification can be implemented as a web service interface, typically using a database to manage STAC objects.
STAC: A Primer
The STAC specification is designed to create a common language for describing geospatial information, making it easier to leverage, index, and discover. Its modular structure includes several key components: Catalogs, Collections, Items, and Extensions, each serving a distinct purpose in the ecosystem.
Catalogs
At the highest level, a STAC Catalog is a JSON file that organizes and links various geospatial datasets. It acts as a directory, enabling users to navigate through different collections and items efficiently. Think of it as a library catalog that helps you find the book you need among thousands of others.
Collections
A STAC Collection is a specialized type of catalog that provides additional metadata about a group of related datasets. This metadata includes information such as the spatial and temporal extent of the data, licensing terms, and keywords that aid in searching. Collections help group related items, making it easier to manage and discover datasets that cover similar themes or regions.
Items
The core unit of the STAC specification is the STAC Item, which represents an individual spatiotemporal asset. Each item includes critical metadata such as a unique identifier, spatial footprint (geometry), timestamp (datetime), and links to related resources. The assets section within an item contains references to the actual data files, such as URLs to Cloud Optimized GeoTIFFs (COGs).
Extensions
STAC's design is inherently flexible, allowing for the addition of extensions to meet specific needs. Common extensions include the Electro-Optical (EO) extension for remote sensing imagery, the Synthetic Aperture Radar (SAR) extension for radar data, and the Label extension for machine learning training data. These extensions provide additional metadata fields tailored to specific types of geospatial data.
The Role of Cloud Optimized GeoTIFFs (COGs)
COGs are a critical element in the STAC ecosystem, designed to make geospatial data more accessible and efficient to use, especially in cloud environments. Unlike traditional GeoTIFFs, COGs are optimized for HTTP range requests, allowing for partial downloads of the file. This means users can access specific parts of the data without needing to download the entire file, significantly improving performance for large datasets.
Storage and Access
COGs are typically stored in cloud storage solutions such as Amazon S3, Google Cloud Storage, or Azure Blob Storage. The actual data files remain in these cloud storage locations, while the STAC Items provide metadata and references to these files. This separation of metadata and data storage is a key feature of STAC, enabling efficient data discovery and access without duplicating storage.
For example, a STAC Item referencing a COG might look like this:
{
"type": "Feature",
"stac_version": "1.0.0",
"id": "example-item",
"properties": {
"datetime": "2023-06-13T00:00:00Z"
},
"geometry": {
"type": "Polygon",
"coordinates": [[
[-180, -90],
[180, -90],
[180, 90],
[-180, 90],
[-180, -90]
]]
},
"assets": {
"image": {
"href": "https://my-cloud-storage.com/path/to/cog.tif",
"type": "image/tiff; application=geotiff; profile=cloud-optimized"
}
}
}
In this example, the href
field in the assets
section points to the location of the COG file in cloud storage. The STAC Item itself is a JSON document containing metadata about the COG, such as its datetime and geometry.
Practical Application and Integration
STAC is not just about defining standards; it also provides practical tools for implementation and integration. The STAC API, for example, offers a RESTful interface for querying and interacting with STAC catalogs and items. This API supports operations such as searching for items based on spatial and temporal criteria, browsing catalogs and collections, and downloading data files directly.
Example of a STAC Catalog and Collection
To illustrate, let's consider a STAC Catalog that includes a collection of remote sensing images:
Catalog:
{
"stac_version": "1.0.0",
"id": "example-catalog",
"description": "A simple catalog example",
"links": [
{
"rel": "child",
"href": "collection.json"
}
]
}
Collection:
{
"stac_version": "1.0.0",
"id": "example-collection",
"description": "A collection of sample items",
"extent": {
"spatial": {
"bbox": [[-180, -90, 180, 90]]
},
"temporal": {
"interval": [["2020-01-01T00:00:00Z", null]]
}
},
"license": "CC-BY-4.0",
"links": [
{
"rel": "item",
"href": "item.json"
}
]
}
Conclusion
The STAC specification represents a significant advancement in the organization and accessibility of geospatial data. By separating metadata from data storage and providing a flexible, extensible framework, STAC enables efficient discovery, access, and use of a wide range of geospatial datasets. Whether you are managing satellite imagery, radar data, or labeled datasets for machine learning, STAC provides the tools and standards to make your data more accessible and usable.
For more detailed information, refer to the STAC Specification and resources from USGS.gov Earthdata.