Schema validation for spatial records is the foundational control plane for any OGC-compliant data publishing pipeline. When geographic datasets move from ingestion to catalog publication, structural integrity, coordinate reference system (CRS) compliance, and metadata alignment must be verified before indexing. Without deterministic validation, malformed geometries, missing mandatory fields, or non-conforming XML/JSON payloads corrupt downstream services, break spatial queries, and trigger compliance failures in government and enterprise environments.
This guide outlines a production-ready validation workflow for spatial records, focusing on OGC API Features, GeoJSON, and metadata schemas. It provides tested Python patterns, step-by-step execution logic, and troubleshooting strategies for platform engineers and spatial data publishers.
Before implementing validation pipelines, ensure your environment meets the following baseline requirements:
pydantic>=2.0, jsonschema, shapely>=2.0, pyproj, lxml, fastjsonschema (optional for high-throughput validation)pyproj bundled database for URN resolutionFamiliarity with OGC API Features conformance classes and metadata publishing standards such as Implementing ISO 19115 Metadata Standards will significantly reduce integration friction. Engineers should also provision structured logging (JSON format) and a dead-letter queue (DLQ) for failed payloads before deploying validation logic to production.
A robust spatial validation pipeline follows a deterministic sequence. Each stage isolates failure modes so that downstream systems receive only verified payloads.
Accept raw payloads (GeoJSON FeatureCollection, XML metadata, or OGC API JSON responses). Normalize encoding, strip BOMs, and enforce UTF-8. Parse into a structured intermediate representation. At this stage, validate content-type headers and reject payloads exceeding configured size thresholds to prevent memory exhaustion.
Validate against the appropriate JSON Schema or XSD. For vector features, enforce OGC GeoJSON structural rules. For metadata, validate against ISO 19115 or DCAT-AP for Spatial Data Portals profiles. Reject early if mandatory top-level keys (type, features, properties) or XML namespaces are missing. Structural validation should run before geometry parsing to avoid expensive computational overhead on fundamentally broken documents.
Extract geometry objects and validate using Shapely. Check for:
null geometries flagged explicitly rather than silently droppedVerify that the declared CRS matches the coordinate values. Use pyproj to resolve EPSG codes, URNs (urn:ogc:def:crs:EPSG::4326), or WKT strings. Reject records with mismatched axis orders (e.g., lat/long vs long/lat) or deprecated CRS identifiers. Normalize all coordinates to WGS84 (EPSG:4326) if your catalog requires a single canonical projection.
Cross-reference spatial bounds with declared metadata extents. Ensure temporal coverage, licensing, and attribution fields comply with organizational policies. Once validation passes, emit a structured validation report alongside the payload for catalog ingestion. Failed records route to quarantine with explicit error codes for remediation.
The following implementation demonstrates a reliable, type-safe validation function using modern Python tooling. It separates structural, geometric, and CRS checks while collecting all errors before failing.
import json
from typing import Any, Dict, List, Optional
from pydantic import BaseModel, ValidationError, field_validator
from shapely.geometry import shape, mapping
from shapely.validation import make_valid
from pyproj import CRS, Transformer
from jsonschema import validate, ValidationError as JsonSchemaError
class SpatialValidationError(Exception):
def __init__(self, errors: List[str]):
self.errors = errors
super().__init__(f"Validation failed: {'; '.join(errors)}")
def validate_spatial_record(
payload: Dict[str, Any],
json_schema: Dict[str, Any],
target_crs: str = "EPSG:4326"
) -> Dict[str, Any]:
errors: List[str] = []
# 1. Structural JSON Schema validation
try:
validate(instance=payload, schema=json_schema)
except JsonSchemaError as e:
errors.append(f"Schema violation: {e.message}")
# 2. Geometry validation
features = payload.get("features", [])
for idx, feat in enumerate(features):
geom = feat.get("geometry")
if not geom:
errors.append(f"Feature {idx}: missing geometry")
continue
try:
shp = shape(geom)
if not shp.is_valid:
shp = make_valid(shp)
if not shp.is_valid:
errors.append(f"Feature {idx}: irreparable invalid geometry")
continue
# Enforce coordinate bounds
minx, miny, maxx, maxy = shp.bounds
if not (-180 <= minx <= maxx <= 180 and -90 <= miny <= maxy <= 90):
errors.append(f"Feature {idx}: coordinates out of geographic bounds")
except Exception as e:
errors.append(f"Feature {idx}: geometry parse error - {str(e)}")
# 3. CRS validation & transformation readiness
declared_crs = payload.get("crs", {}).get("properties", {}).get("name")
if declared_crs:
try:
crs_obj = CRS.from_user_input(declared_crs)
if not crs_obj.equals(CRS.from_epsg(4326)):
# Verify transformability to target CRS
Transformer.from_crs(crs_obj, target_crs, always_xy=True)
except Exception as e:
errors.append(f"CRS resolution failed: {declared_crs} - {str(e)}")
if errors:
raise SpatialValidationError(errors)
return payload
make_valid attempts to fix common topology issues (e.g., self-intersecting polygons) before rejecting. Use cautiously; log all auto-repairs for audit trails.pyproj.Transformer verifies coordinate convertibility without mutating the original payload. Actual projection should occur downstream during catalog indexing.Validation failures should never block the entire ingestion pipeline. Implement a tiered routing strategy:
status: rejected. These require manual intervention or upstream source correction.record_id, error_codes, validation_stage, and payload_hash. Integrate with observability stacks (OpenTelemetry, Prometheus, or ELK) to track validation success rates over time.Government and enterprise environments often require immutable audit trails. Store validation reports alongside catalog entries using content-addressable storage (e.g., S3 with SHA-256 keys) to satisfy compliance audits and data lineage requirements.
Once records pass validation, they must be prepared for spatial indexing. Catalog systems expect consistent bounding boxes, normalized CRS, and machine-readable metadata. Validation outputs should include:
These artifacts feed directly into Spatial Metadata & Catalog Integration workflows, enabling automated search indexing, facet generation, and spatial query optimization. By decoupling validation from indexing, platform teams can scale horizontally: validation nodes handle CPU-intensive geometry checks, while catalog nodes focus on Elasticsearch/PostGIS ingestion.
Schema validation for spatial records is not a single gate but a continuous, multi-stage control process. By enforcing structural compliance, verifying geometry integrity, resolving CRS ambiguities, and routing failures deterministically, engineering teams prevent downstream corruption and maintain OGC compliance. Implementing the patterns outlined here reduces catalog downtime, accelerates metadata publishing, and ensures spatial datasets remain query-ready across enterprise and government platforms.