Implementing ISO 19115 Metadata Standards

Implementing ISO 19115 Metadata Standards is a foundational requirement for any organization publishing geospatial datasets to enterprise catalogs, open data portals, or OGC-compliant web services. The ISO 19115 family defines a comprehensive, internationally recognized schema for describing geographic information, covering identification, quality, spatial reference, distribution, and maintenance. For GIS platform engineers, spatial data publishers, and government technical teams, strict adherence to this standard ensures cross-system interoperability, regulatory compliance, and reliable catalog indexing across heterogeneous environments.

This guide outlines a production-ready workflow for generating, validating, and publishing ISO 19115 metadata using Python. The approach emphasizes namespace management, mandatory element mapping, and automated validation pipelines that integrate directly into broader Spatial Metadata & Catalog Integration architectures, enabling scalable, repeatable metadata lifecycle management.

Prerequisites & Environment Setup

Before implementing metadata generation, ensure your development environment meets the following technical requirements:

  • Python 3.10+ with pip package management
  • Core libraries: lxml (for XML construction and schema validation), pydantic (for structured metadata modeling and type enforcement), geopandas or osgeo (for spatial dataset introspection)
  • Schema assets: Official ISO 19115 XSD files (typically gmd and gco namespaces) downloaded from the ISO 19115-1:2014 standard page or mirrored via OGC reference repositories
  • Source data access: Read permissions to spatial datasets (GeoPackage, Shapefile, PostGIS, or WMS/WFS endpoints) to extract bounding boxes, coordinate reference systems (CRS), and dataset attributes
  • Catalog API credentials: Endpoints for GeoNetwork, CKAN, ArcGIS Enterprise, or custom OGC CSW/OGC API - Records services

Install dependencies:

pip install lxml pydantic geopandas

Step 1: Extract Spatial Dataset Characteristics

The first phase involves parsing the source dataset to capture mandatory metadata elements. ISO 19115 requires precise spatial and temporal bounds, authoritative CRS identifiers, and clear provenance. Using geopandas, you can programmatically extract these attributes without manual intervention:

import geopandas as gpd
from datetime import datetime
from pydantic import BaseModel, Field
from typing import Optional

class DatasetProfile(BaseModel):
    title: str
    abstract: str
    publication_date: datetime
    bbox: tuple[float, float, float, float]  # minx, miny, maxx, maxy
    crs_epsg: int
    format: str = "GeoPackage"
    language: str = "eng"

def extract_profile(path: str, title: str, abstract: str) -> DatasetProfile:
    gdf = gpd.read_file(path)
    total_bounds = gdf.total_bounds
    return DatasetProfile(
        title=title,
        abstract=abstract,
        publication_date=datetime.now(),
        bbox=(total_bounds[0], total_bounds[1], total_bounds[2], total_bounds[3]),
        crs_epsg=gdf.crs.to_epsg() if gdf.crs else 4326,
        format="GeoPackage"
    )

This structured extraction guarantees type safety before XML serialization. Missing or malformed spatial references are caught at the Pydantic layer, preventing downstream catalog rejection.

Step 2: Map to ISO 19115 Core Structure

ISO 19115 organizes metadata into hierarchical blocks rooted at gmd:MD_Metadata. Understanding the mandatory vs. conditional elements is critical for compliance. Key child elements include:

  • gmd:fileIdentifier (UUID for unique tracking)
  • gmd:language (ISO 639-2/3 code)
  • gmd:characterSet (typically UTF-8)
  • gmd:hierarchyLevel (dataset, series, service)
  • gmd:identificationInfo (title, abstract, dates, purpose)
  • gmd:spatialRepresentationInfo (grid/vector, resolution)
  • gmd:referenceSystemInfo (CRS EPSG/URN)
  • gmd:distributionInfo (format, access URL)

When mapping internal data models to this structure, maintain a clear separation between business logic and XML serialization. Organizations frequently cross-walk ISO 19115 to DCAT-AP for Spatial Data Portals to satisfy both European open data mandates and enterprise GIS requirements. A well-documented mapping table prevents attribute drift during portal migrations.

Step 3: Construct XML with Strict Namespace Compliance

ISO 19115 relies heavily on XML namespaces (gmd, gco, gml, xsi). Mishandling prefixes or omitting required namespace declarations is the most common cause of validation failures. The following pattern uses lxml.etree to build a compliant document programmatically:

from lxml import etree
import uuid

NSMAP = {
    "gmd": "http://www.isotc211.org/2005/gmd",
    "gco": "http://www.isotc211.org/2005/gco",
    "gml": "http://www.opengis.net/gml",
    "xsi": "http://www.w3.org/2001/XMLSchema-instance"
}

def build_iso19115_xml(profile: DatasetProfile) -> etree._Element:
    # Root element with schema location
    root = etree.Element(
        f"{{{NSMAP['gmd']}}}MD_Metadata",
        nsmap=NSMAP,
        attrib={f"{{{NSMAP['xsi']}}}schemaLocation": "http://www.isotc211.org/2005/gmd http://schemas.opengis.net/iso/19139/20060504/gmd/gmd.xsd"}
    )
    
    # File Identifier
    file_id = etree.SubElement(root, f"{{{NSMAP['gmd']}}}fileIdentifier")
    etree.SubElement(file_id, f"{{{NSMAP['gco']}}}CharacterString").text = str(uuid.uuid4())
    
    # Language
    lang = etree.SubElement(root, f"{{{NSMAP['gmd']}}}language")
    etree.SubElement(lang, f"{{{NSMAP['gco']}}}CharacterString").text = profile.language
    
    # Hierarchy Level
    hierarchy = etree.SubElement(root, f"{{{NSMAP['gmd']}}}hierarchyLevel")
    etree.SubElement(hierarchy, f"{{{NSMAP['gmd']}}}MD_ScopeCode", codeListValue="dataset")
    
    # Identification Info
    ident = etree.SubElement(root, f"{{{NSMAP['gmd']}}}identificationInfo")
    md_id = etree.SubElement(ident, f"{{{NSMAP['gmd']}}}MD_DataIdentification")
    
    # Title
    title_el = etree.SubElement(md_id, f"{{{NSMAP['gmd']}}}citation")
    citation = etree.SubElement(title_el, f"{{{NSMAP['gmd']}}}CI_Citation")
    title_str = etree.SubElement(citation, f"{{{NSMAP['gmd']}}}title")
    etree.SubElement(title_str, f"{{{NSMAP['gco']}}}CharacterString").text = profile.title
    
    # Abstract
    abstract_el = etree.SubElement(md_id, f"{{{NSMAP['gmd']}}}abstract")
    etree.SubElement(abstract_el, f"{{{NSMAP['gco']}}}CharacterString").text = profile.abstract
    
    # Bounding Box
    extent = etree.SubElement(md_id, f"{{{NSMAP['gmd']}}}extent")
    ex_geo = etree.SubElement(extent, f"{{{NSMAP['gmd']}}}EX_Extent")
    geo_box = etree.SubElement(ex_geo, f"{{{NSMAP['gmd']}}}geographicElement")
    bbox = etree.SubElement(geo_box, f"{{{NSMAP['gmd']}}}EX_GeographicBoundingBox")
    etree.SubElement(bbox, f"{{{NSMAP['gmd']}}}westBoundLongitude").text = str(profile.bbox[0])
    etree.SubElement(bbox, f"{{{NSMAP['gmd']}}}southBoundLatitude").text = str(profile.bbox[1])
    etree.SubElement(bbox, f"{{{NSMAP['gmd']}}}eastBoundLongitude").text = str(profile.bbox[2])
    etree.SubElement(bbox, f"{{{NSMAP['gmd']}}}northBoundLatitude").text = str(profile.bbox[3])
    
    # Reference System (CRS)
    ref_sys = etree.SubElement(root, f"{{{NSMAP['gmd']}}}referenceSystemInfo")
    ref_el = etree.SubElement(ref_sys, f"{{{NSMAP['gmd']}}}MD_ReferenceSystem")
    ref_id = etree.SubElement(ref_el, f"{{{NSMAP['gmd']}}}referenceSystemIdentifier")
    rs_code = etree.SubElement(ref_id, f"{{{NSMAP['gmd']}}}RS_Identifier")
    code_val = etree.SubElement(rs_code, f"{{{NSMAP['gco']}}}code")
    etree.SubElement(code_val, f"{{{NSMAP['gco']}}}CharacterString").text = f"EPSG:{profile.crs_epsg}"
    
    return root

This construction method guarantees proper namespace scoping. Notice how f"{{{NSMAP['gmd']}}}ElementName" syntax prevents lxml from auto-generating ns0, ns1 prefixes, which frequently break downstream harvesters.

Step 4: Automated Validation & Quality Assurance

Generating XML is only half the battle. Production systems must verify structural compliance against the official XSD before ingestion. Using lxml’s XMLSchema validator provides fast, deterministic checks:

def validate_xml(xml_root: etree._Element, xsd_path: str) -> tuple[bool, list[str]]:
    try:
        with open(xsd_path, "rb") as f:
            schema_doc = etree.parse(f)
        schema = etree.XMLSchema(schema_doc)
        schema.assertValid(xml_root)
        return True, []
    except etree.DocumentInvalid as e:
        return False, [str(e)]
    except Exception as e:
        return False, [f"Validation infrastructure error: {e}"]

For teams building continuous validation into CI/CD pipelines, Validating ISO 19115 XML Against XSD Schemas with lxml provides extended patterns for caching schema trees, handling network-fallback XSD resolution, and generating human-readable error reports. Always reference the official lxml validation documentation when troubleshooting namespace binding or schema location resolution issues.

Step 5: Publish & Synchronize to Enterprise Catalogs

Once validated, metadata must be pushed to catalog endpoints. Most enterprise platforms support OGC Catalog Service for the Web (CSW) 2.0.2/3.0 or OGC API - Records. A robust publishing routine should:

  1. Serialize the validated XML to bytes
  2. Attach appropriate Content-Type: application/xml headers
  3. Retry on transient HTTP 5xx errors with exponential backoff
  4. Log the catalog-assigned identifier for audit trails
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def publish_to_csw(endpoint: str, xml_bytes: bytes, api_key: str) -> dict:
    session = requests.Session()
    retry = Retry(total=3, backoff_factor=1, status_forcelist=[500, 502, 503, 504])
    session.mount("https://", HTTPAdapter(max_retries=retry))
    
    headers = {
        "Content-Type": "application/xml",
        "Authorization": f"Bearer {api_key}"
    }
    
    response = session.post(endpoint, data=xml_bytes, headers=headers, timeout=30)
    response.raise_for_status()
    return response.json()

After initial publication, metadata drift becomes inevitable as source datasets update. Integrating Automated Metadata Harvesting Workflows ensures that changes to spatial extents, CRS updates, or distribution URLs are detected and synchronized without manual intervention. Schedule harvesters using cron, Airflow, or GitHub Actions, and always compare checksums of generated XML before triggering catalog updates to reduce unnecessary API load.

Production Considerations & Edge Cases

Schema Versioning & Backward Compatibility

ISO 19115-1:2014 remains the most widely deployed version, but ISO 19115-2:2018 introduces extensions for imagery, gridded data, and sensor systems. Maintain a configuration flag in your pipeline to toggle between schema versions. Never hardcode XSD paths; resolve them via environment variables or a centralized schema registry.

Handling Missing or Partial Data

Geospatial datasets frequently lack publication dates, authoritative abstracts, or precise bounding boxes. Implement fallback strategies:

  • Use dataset modification timestamps as publicationDate when creation dates are unavailable
  • Generate machine-readable abstracts from layer names and attribute dictionaries
  • Default to WGS84 (EPSG:4326) bounding boxes only when CRS projection fails, and flag these records for manual review

Performance at Scale

Generating thousands of ISO 19115 records concurrently can exhaust memory if lxml trees are not explicitly cleared. Use etree.clear() after serialization, or stream XML output via etree.iterparse() when writing to disk. For high-throughput environments, pre-compile XSD schemas once at application startup rather than parsing them per-request.

Security & Sanitization

Never trust raw dataset attributes. Strip control characters, normalize whitespace, and escape XML-reserved sequences (<, >, &, ", ') before injection. Pydantic validators combined with lxml’s built-in escaping mechanisms provide defense-in-depth against malformed input or injection attempts.

Conclusion

Implementing ISO 19115 Metadata Standards requires disciplined namespace management, strict type validation, and automated pipeline integration. By extracting dataset characteristics programmatically, mapping them to the gmd:MD_Metadata hierarchy, constructing XML with explicit namespace declarations, and validating against authoritative XSDs, engineering teams can eliminate manual metadata bottlenecks. When paired with robust publishing routines and continuous harvesting, this workflow transforms metadata from a compliance burden into a scalable, machine-actionable asset that powers enterprise search, spatial discovery, and regulatory reporting.