The Data Catalog Vocabulary Application Profile (DCAT-AP) has become the de facto standard for interoperable open data publishing across European public sector portals. When applied to geospatial infrastructure, DCAT-AP bridges the gap between traditional GIS metadata models and modern, machine-readable web catalogs. Implementing DCAT-AP for Spatial Data Portals requires careful mapping of coordinate reference systems, bounding geometries, and OGC service endpoints to the W3C DCAT ontology (now at version 3). This guide provides a production-tested workflow for GIS platform engineers, Python backend developers, and agency technical teams deploying spatial catalogs at scale.
As a foundational component of broader Spatial Metadata & Catalog Integration architectures, DCAT-AP enables cross-jurisdictional discovery, automated harvesting, and semantic search without requiring consumers to parse legacy XML schemas. By standardizing how geospatial assets are described, agencies can seamlessly federate catalogs across regional, national, and EU-level portals while maintaining strict compliance with INSPIRE and Open Data directives.
Before implementing DCAT-AP serialization, ensure your environment and data pipeline meet the following baseline requirements:
rdflib>=6.3.0, pyshacl>=0.20.0, pyproj>=3.4.0, and requests installed.dcat:Distribution access points.locn:geometry and dcat:bbox specifications.dcat:, dct:, foaf:, vcard:, locn:, gsp:, and schema:.If your organization currently relies on legacy geospatial metadata, review Implementing ISO 19115 Metadata Standards to establish a reliable cross-walk before DCAT-AP transformation. Proper field mapping at this stage prevents downstream validation failures and ensures semantic consistency across catalog nodes.
Query your spatial repository or CSW endpoint to extract dataset-level metadata. Normalize titles, abstracts, keywords, and responsible organizations into a structured Python dictionary or Pydantic model. Extract persistent identifiers early; DCAT-AP requires stable, resolvable URIs for dcat:Dataset resources. Avoid generating ephemeral or session-based identifiers, as they break long-term catalog federation.
When parsing legacy ISO records, map gmd:identificationInfo to dct:title and dct:description, and translate gmd:contact blocks into dct:publisher or dcat:contactPoint using vcard:Organization structures. Validate required fields before proceeding to spatial transformation.
Convert native CRS bounding boxes to WGS 84. DCAT-AP expects dcat:bbox as a string of four comma-separated coordinates (minLon,minLat,maxLon,maxLat) or a GeoJSON geometry wrapped in locn:geometry. Use pyproj for reliable coordinate transformations, as manual matrix conversions introduce precision drift and topology errors.
from pyproj import Transformer
from shapely.geometry import box
def normalize_bbox(min_x, min_y, max_x, max_y, src_crs="EPSG:3035"):
transformer = Transformer.from_crs(src_crs, "EPSG:4326", always_xy=True)
min_lon, min_lat = transformer.transform(min_x, min_y)
max_lon, max_lat = transformer.transform(max_x, max_y)
# Clamp to valid geographic bounds
min_lon = max(-180.0, min(180.0, min_lon))
max_lon = max(-180.0, min(180.0, max_lon))
min_lat = max(-90.0, min(90.0, min_lat))
max_lat = max(-90.0, min(90.0, max_lat))
return f"{min_lon:.6f},{min_lat:.6f},{max_lon:.6f},{max_lat:.6f}"
Always verify that transformed coordinates fall within valid ranges (±180, ±90). For complex polygonal extents, serialize the geometry as a GeoJSON string and attach it via locn:geometry using the gsp:Geometry class. Refer to the official EPSG Geodetic Parameter Registry when resolving ambiguous or deprecated CRS codes in legacy datasets.
Initialize an rdflib.Graph and bind required prefixes. Create a dcat:Dataset resource with a persistent URI, preferably minted via your portal’s identifier scheme (e.g., https://data.example.org/dataset/{uuid}). Attach dcat:Distribution nodes only after confirming service availability.
from rdflib import Graph, Namespace, Literal, URIRef
from rdflib.namespace import DCAT, DCTERMS, FOAF, RDF, XSD
DCAT_AP = Namespace("http://data.europa.eu/r5r/")
LOCN = Namespace("http://www.w3.org/ns/locn#")
GSP = Namespace("http://www.opengis.net/ont/geosparql#")
def build_dataset_graph(dataset_id: str, metadata: dict) -> Graph:
g = Graph()
g.bind("dcat", DCAT)
g.bind("dct", DCTERMS)
g.bind("locn", LOCN)
g.bind("gsp", GSP)
ds_uri = URIRef(f"https://data.example.org/dataset/{dataset_id}")
g.add((ds_uri, RDF.type, DCAT.Dataset))
g.add((ds_uri, DCTERMS.title, Literal(metadata["title"], lang="en")))
g.add((ds_uri, DCTERMS.description, Literal(metadata["abstract"], lang="en")))
g.add((ds_uri, DCAT.bbox, Literal(metadata["bbox"])))
return g
Maintain strict namespace hygiene. Mixing unprefixed URIs or relying on rdflib’s auto-prefixing can cause serialization inconsistencies across harvesters.
Map each discoverable endpoint to a dcat:Distribution resource. Attach dcat:accessURL for service endpoints and dcat:downloadURL for direct file downloads. Specify dct:format using IANA media types (e.g., application/vnd.ogc.wms_xml, application/json for OGC API).
For WMS and WFS layers, include dcat:mediaType, dct:license, and dcat:accessService where applicable. When generating service-specific payloads, consult Generating DCAT-AP Compliant JSON-LD for WMS Layers to ensure layer capabilities, styling parameters, and CRS declarations align with DCAT-AP v3 expectations.
def add_distribution(graph: Graph, dataset_uri: URIRef, service_url: str, media_type: str):
dist_uri = URIRef(f"{dataset_uri}/distribution/{hash(service_url) % 10000}")
graph.add((dataset_uri, DCAT.distribution, dist_uri))
graph.add((dist_uri, RDF.type, DCAT.Distribution))
graph.add((dist_uri, DCAT.accessURL, URIRef(service_url)))
graph.add((dist_uri, DCTERMS.format, Literal(media_type)))
return dist_uri
Validate that all OGC endpoints return HTTP 200 and respond to GetCapabilities or OpenAPI spec requests before attaching them to the catalog graph. Broken distributions degrade harvester trust scores and trigger automated de-listing in federated networks.
Before publishing, validate the RDF graph against the official DCAT-AP SHACL shapes. Use pyshacl to catch missing mandatory properties, datatype mismatches, and cardinality violations.
from pyshacl import validate
import requests
def validate_graph(graph: Graph) -> tuple[bool, str]:
shacl_url = "https://raw.githubusercontent.com/SEMICeu/DCAT-AP/master/releases/2.0.0/dcat-ap_2.0.0_shacl_shapes.ttl"
shacl_graph = Graph().parse(shacl_url, format="turtle")
conforms, results_graph, results_text = validate(
graph, shacl_graph=shacl_graph, inference="rdfs", debug=False
)
return conforms, results_text
Once validated, serialize to JSON-LD or Turtle. JSON-LD is preferred for modern web portals due to native browser parsing and compatibility with schema.org extensions. Use compact serialization with @context mapping to reduce payload size. For large catalogs, implement chunked serialization or stream-based output to avoid memory exhaustion.
Deploy the serialization pipeline as a scheduled job or event-driven microservice. Trigger metadata regeneration on dataset updates, service endpoint changes, or license modifications. Implement idempotent writes to prevent duplicate URIs and stale graph states.
Integrate the pipeline with Automated Metadata Harvesting Workflows to synchronize with regional aggregators like data.europa.eu or national open data hubs. Configure incremental harvesting using dct:modified timestamps and ETag headers to reduce bandwidth and processing overhead.
Monitor catalog health with automated SHACL validation checks, endpoint uptime probes, and spatial extent consistency audits. Log validation failures to a centralized dashboard and route critical errors to GIS platform engineers for rapid remediation.
minLon,minLat,maxLon,maxLat and validate against geographic constraints.dct:license on either the dataset or distribution level. Omitting it blocks inclusion in EU open data portals.dct:title, dct:description, and dcat:theme. Use pyshacl in development mode to catch violations before production deployment.dcat:accessURL dynamically to prevent stale catalog entries.Once your DCAT-AP pipeline is stable, expand coverage by integrating spatial themes (dcat:theme), temporal coverage (dct:temporal), and provenance metadata (dct:provenance). Align your catalog with INSPIRE metadata profiles where applicable, and expose dcat:Catalog nodes to enable cross-portal federation.
For advanced use cases, implement spatial indexing with Elasticsearch or OpenSearch using geo_shape mappings derived from dcat:bbox. Combine DCAT-AP with schema.org Dataset extensions to improve SEO visibility and enable rich result rendering in search engines.
By treating DCAT-AP as a living specification rather than a static export format, agencies can future-proof their spatial data infrastructure, reduce manual curation overhead, and participate seamlessly in pan-European data ecosystems.