Implementing ISO 19115 Metadata Standards is a foundational requirement for any organization publishing geospatial datasets to enterprise catalogs, open data portals, or OGC-compliant web services. The ISO 19115 family defines a comprehensive, internationally recognized schema for describing geographic information, covering identification, quality, spatial reference, distribution, and maintenance. For GIS platform engineers, spatial data publishers, and government technical teams, strict adherence to this standard ensures cross-system interoperability, regulatory compliance, and reliable catalog indexing across heterogeneous environments.
This guide outlines a production-ready workflow for generating, validating, and publishing ISO 19115 metadata using Python. The approach emphasizes namespace management, mandatory element mapping, and automated validation pipelines that integrate directly into broader Spatial Metadata & Catalog Integration architectures, enabling scalable, repeatable metadata lifecycle management.
Before implementing metadata generation, ensure your development environment meets the following technical requirements:
pip package managementlxml (for XML construction and schema validation), pydantic (for structured metadata modeling and type enforcement), geopandas or osgeo (for spatial dataset introspection)gmd and gco namespaces) downloaded from the ISO 19115-1:2014 standard page or mirrored via OGC reference repositoriesInstall dependencies:
pip install lxml pydantic geopandas
The first phase involves parsing the source dataset to capture mandatory metadata elements. ISO 19115 requires precise spatial and temporal bounds, authoritative CRS identifiers, and clear provenance. Using geopandas, you can programmatically extract these attributes without manual intervention:
import geopandas as gpd
from datetime import datetime
from pydantic import BaseModel, Field
from typing import Optional
class DatasetProfile(BaseModel):
title: str
abstract: str
publication_date: datetime
bbox: tuple[float, float, float, float] # minx, miny, maxx, maxy
crs_epsg: int
format: str = "GeoPackage"
language: str = "eng"
def extract_profile(path: str, title: str, abstract: str) -> DatasetProfile:
gdf = gpd.read_file(path)
total_bounds = gdf.total_bounds
return DatasetProfile(
title=title,
abstract=abstract,
publication_date=datetime.now(),
bbox=(total_bounds[0], total_bounds[1], total_bounds[2], total_bounds[3]),
crs_epsg=gdf.crs.to_epsg() if gdf.crs else 4326,
format="GeoPackage"
)
This structured extraction guarantees type safety before XML serialization. Missing or malformed spatial references are caught at the Pydantic layer, preventing downstream catalog rejection.
ISO 19115 organizes metadata into hierarchical blocks rooted at gmd:MD_Metadata. Understanding the mandatory vs. conditional elements is critical for compliance. Key child elements include:
gmd:fileIdentifier (UUID for unique tracking)gmd:language (ISO 639-2/3 code)gmd:characterSet (typically UTF-8)gmd:hierarchyLevel (dataset, series, service)gmd:identificationInfo (title, abstract, dates, purpose)gmd:spatialRepresentationInfo (grid/vector, resolution)gmd:referenceSystemInfo (CRS EPSG/URN)gmd:distributionInfo (format, access URL)When mapping internal data models to this structure, maintain a clear separation between business logic and XML serialization. Organizations frequently cross-walk ISO 19115 to DCAT-AP for Spatial Data Portals to satisfy both European open data mandates and enterprise GIS requirements. A well-documented mapping table prevents attribute drift during portal migrations.
ISO 19115 relies heavily on XML namespaces (gmd, gco, gml, xsi). Mishandling prefixes or omitting required namespace declarations is the most common cause of validation failures. The following pattern uses lxml.etree to build a compliant document programmatically:
from lxml import etree
import uuid
NSMAP = {
"gmd": "http://www.isotc211.org/2005/gmd",
"gco": "http://www.isotc211.org/2005/gco",
"gml": "http://www.opengis.net/gml",
"xsi": "http://www.w3.org/2001/XMLSchema-instance"
}
def build_iso19115_xml(profile: DatasetProfile) -> etree._Element:
# Root element with schema location
root = etree.Element(
f"{{{NSMAP['gmd']}}}MD_Metadata",
nsmap=NSMAP,
attrib={f"{{{NSMAP['xsi']}}}schemaLocation": "http://www.isotc211.org/2005/gmd http://schemas.opengis.net/iso/19139/20060504/gmd/gmd.xsd"}
)
# File Identifier
file_id = etree.SubElement(root, f"{{{NSMAP['gmd']}}}fileIdentifier")
etree.SubElement(file_id, f"{{{NSMAP['gco']}}}CharacterString").text = str(uuid.uuid4())
# Language
lang = etree.SubElement(root, f"{{{NSMAP['gmd']}}}language")
etree.SubElement(lang, f"{{{NSMAP['gco']}}}CharacterString").text = profile.language
# Hierarchy Level
hierarchy = etree.SubElement(root, f"{{{NSMAP['gmd']}}}hierarchyLevel")
etree.SubElement(hierarchy, f"{{{NSMAP['gmd']}}}MD_ScopeCode", codeListValue="dataset")
# Identification Info
ident = etree.SubElement(root, f"{{{NSMAP['gmd']}}}identificationInfo")
md_id = etree.SubElement(ident, f"{{{NSMAP['gmd']}}}MD_DataIdentification")
# Title
title_el = etree.SubElement(md_id, f"{{{NSMAP['gmd']}}}citation")
citation = etree.SubElement(title_el, f"{{{NSMAP['gmd']}}}CI_Citation")
title_str = etree.SubElement(citation, f"{{{NSMAP['gmd']}}}title")
etree.SubElement(title_str, f"{{{NSMAP['gco']}}}CharacterString").text = profile.title
# Abstract
abstract_el = etree.SubElement(md_id, f"{{{NSMAP['gmd']}}}abstract")
etree.SubElement(abstract_el, f"{{{NSMAP['gco']}}}CharacterString").text = profile.abstract
# Bounding Box
extent = etree.SubElement(md_id, f"{{{NSMAP['gmd']}}}extent")
ex_geo = etree.SubElement(extent, f"{{{NSMAP['gmd']}}}EX_Extent")
geo_box = etree.SubElement(ex_geo, f"{{{NSMAP['gmd']}}}geographicElement")
bbox = etree.SubElement(geo_box, f"{{{NSMAP['gmd']}}}EX_GeographicBoundingBox")
etree.SubElement(bbox, f"{{{NSMAP['gmd']}}}westBoundLongitude").text = str(profile.bbox[0])
etree.SubElement(bbox, f"{{{NSMAP['gmd']}}}southBoundLatitude").text = str(profile.bbox[1])
etree.SubElement(bbox, f"{{{NSMAP['gmd']}}}eastBoundLongitude").text = str(profile.bbox[2])
etree.SubElement(bbox, f"{{{NSMAP['gmd']}}}northBoundLatitude").text = str(profile.bbox[3])
# Reference System (CRS)
ref_sys = etree.SubElement(root, f"{{{NSMAP['gmd']}}}referenceSystemInfo")
ref_el = etree.SubElement(ref_sys, f"{{{NSMAP['gmd']}}}MD_ReferenceSystem")
ref_id = etree.SubElement(ref_el, f"{{{NSMAP['gmd']}}}referenceSystemIdentifier")
rs_code = etree.SubElement(ref_id, f"{{{NSMAP['gmd']}}}RS_Identifier")
code_val = etree.SubElement(rs_code, f"{{{NSMAP['gco']}}}code")
etree.SubElement(code_val, f"{{{NSMAP['gco']}}}CharacterString").text = f"EPSG:{profile.crs_epsg}"
return root
This construction method guarantees proper namespace scoping. Notice how f"{{{NSMAP['gmd']}}}ElementName" syntax prevents lxml from auto-generating ns0, ns1 prefixes, which frequently break downstream harvesters.
Generating XML is only half the battle. Production systems must verify structural compliance against the official XSD before ingestion. Using lxml’s XMLSchema validator provides fast, deterministic checks:
def validate_xml(xml_root: etree._Element, xsd_path: str) -> tuple[bool, list[str]]:
try:
with open(xsd_path, "rb") as f:
schema_doc = etree.parse(f)
schema = etree.XMLSchema(schema_doc)
schema.assertValid(xml_root)
return True, []
except etree.DocumentInvalid as e:
return False, [str(e)]
except Exception as e:
return False, [f"Validation infrastructure error: {e}"]
For teams building continuous validation into CI/CD pipelines, Validating ISO 19115 XML Against XSD Schemas with lxml provides extended patterns for caching schema trees, handling network-fallback XSD resolution, and generating human-readable error reports. Always reference the official lxml validation documentation when troubleshooting namespace binding or schema location resolution issues.
Once validated, metadata must be pushed to catalog endpoints. Most enterprise platforms support OGC Catalog Service for the Web (CSW) 2.0.2/3.0 or OGC API - Records. A robust publishing routine should:
Content-Type: application/xml headersimport requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def publish_to_csw(endpoint: str, xml_bytes: bytes, api_key: str) -> dict:
session = requests.Session()
retry = Retry(total=3, backoff_factor=1, status_forcelist=[500, 502, 503, 504])
session.mount("https://", HTTPAdapter(max_retries=retry))
headers = {
"Content-Type": "application/xml",
"Authorization": f"Bearer {api_key}"
}
response = session.post(endpoint, data=xml_bytes, headers=headers, timeout=30)
response.raise_for_status()
return response.json()
After initial publication, metadata drift becomes inevitable as source datasets update. Integrating Automated Metadata Harvesting Workflows ensures that changes to spatial extents, CRS updates, or distribution URLs are detected and synchronized without manual intervention. Schedule harvesters using cron, Airflow, or GitHub Actions, and always compare checksums of generated XML before triggering catalog updates to reduce unnecessary API load.
ISO 19115-1:2014 remains the most widely deployed version, but ISO 19115-2:2018 introduces extensions for imagery, gridded data, and sensor systems. Maintain a configuration flag in your pipeline to toggle between schema versions. Never hardcode XSD paths; resolve them via environment variables or a centralized schema registry.
Geospatial datasets frequently lack publication dates, authoritative abstracts, or precise bounding boxes. Implement fallback strategies:
publicationDate when creation dates are unavailableGenerating thousands of ISO 19115 records concurrently can exhaust memory if lxml trees are not explicitly cleared. Use etree.clear() after serialization, or stream XML output via etree.iterparse() when writing to disk. For high-throughput environments, pre-compile XSD schemas once at application startup rather than parsing them per-request.
Never trust raw dataset attributes. Strip control characters, normalize whitespace, and escape XML-reserved sequences (<, >, &, ", ') before injection. Pydantic validators combined with lxml’s built-in escaping mechanisms provide defense-in-depth against malformed input or injection attempts.
Implementing ISO 19115 Metadata Standards requires disciplined namespace management, strict type validation, and automated pipeline integration. By extracting dataset characteristics programmatically, mapping them to the gmd:MD_Metadata hierarchy, constructing XML with explicit namespace declarations, and validating against authoritative XSDs, engineering teams can eliminate manual metadata bottlenecks. When paired with robust publishing routines and continuous harvesting, this workflow transforms metadata from a compliance burden into a scalable, machine-actionable asset that powers enterprise search, spatial discovery, and regulatory reporting.