How to Parse OGC WMS GetCapabilities XML in Python

Use requests to fetch the GetCapabilities document and xml.etree.ElementTree to traverse it. Map OGC namespaces explicitly — WMS 1.3.0 declares http://www.opengis.net/wms as its default namespace while WMS 1.1.1 typically carries no namespace at all. A version-aware namespace map, combined with a bare-tag fallback, keeps one parser working across both generations of servers. The result is a structured dictionary containing service metadata, the full layer hierarchy, supported coordinate reference systems, and bounding box extents — ready for downstream ingestion into a metadata catalog or GIS platform.

The Core Challenge: Namespace Fragmentation Across WMS Versions

The GetCapabilities operation — the service discovery contract defined in Understanding OGC Web Map Service Specifications — returns an XML document whose element names, namespace declarations, and bounding box structures differ materially between the two active versions of the protocol.

WMS Version XML Structure Comparison Side-by-side illustration of WMS 1.1.1 and WMS 1.3.0 XML document structures, highlighting differences in namespace declarations, CRS element names, and bounding box formats. WMS 1.1.1 <WMT_MS_Capabilities version="1.1.1"> — no default namespace <Layer> <Name>cities</Name> <SRS>EPSG:4326</SRS> <LatLonBoundingBox minx="-180" miny="-90" maxx="180" maxy="90"/> extent as XML attributes </Layer> Axis order: always lon, lat (x = longitude, y = latitude) XPath: bare tags, no prefix WMS 1.3.0 <WMS_Capabilities version="1.3.0" xmlns="…opengis.net/wms"> — default namespace declared <wms:Layer> <wms:Name>cities</wms:Name> <wms:CRS>EPSG:4326</wms:CRS> <wms:EX_GeographicBoundingBox> <wms:westBoundLongitude> <wms:southBoundLatitude> </wms:EX_GeographicBoundingBox> extent as child text elements Axis order: follows CRS definition EPSG:4326 → lat, lon (y first!) XPath: must use wms: prefix

The key divergences that break naive parsers:

  • Element naming. WMS 1.1.1 uses <SRS> for coordinate reference system codes; WMS 1.3.0 renames it to <CRS>. Both may appear in a single capabilities document from servers that advertise backwards compatibility.
  • Bounding box structure. WMS 1.1.1 encodes the geographic extent in <LatLonBoundingBox> as XML attributes (minx, miny, maxx, maxy). WMS 1.3.0 replaces this with <EX_GeographicBoundingBox>, which holds four child text elements: <westBoundLongitude>, <eastBoundLongitude>, <southBoundLatitude>, <northBoundLatitude>.
  • Namespace declarations. WMS 1.3.0 documents declare http://www.opengis.net/wms as the default XML namespace. Every element is in that namespace, so XPath queries using bare tag names silently return nothing. WMS 1.1.1 documents typically carry no namespace at all.
  • Axis order for EPSG:4326. In WMS 1.3.0, the BBOX parameter for GetMap must follow the CRS-defined axis order. For EPSG:4326, the OGC-mandated order is latitude-first (minLat,minLon,maxLat,maxLon), the inverse of the longitude-first convention used in 1.1.1. This is the most common source of blank map responses. The SRS and Coordinate Reference System Handling guide covers the axis-swap logic required before issuing GetMap requests.

Production-Ready Implementation

The following script handles HTTP retrieval, dynamic namespace resolution, and recursive layer extraction. It returns a structured dictionary suitable for downstream metadata catalog synchronization or GIS platform ingestion. The only runtime dependency beyond the standard library is requests.

import json
import requests
import xml.etree.ElementTree as ET
from typing import Any

# WMS 1.3.0 declares http://www.opengis.net/wms as its default namespace.
# WMS 1.1.1 typically uses no namespace; the fallback logic handles both.
WMS_130_NS: dict[str, str] = {
    "wms": "http://www.opengis.net/wms",
    "ows": "http://www.opengis.net/ows/1.1",
}
WMS_111_NS: dict[str, str] = {}


def _safe_text(elem: ET.Element | None) -> str | None:
    """Return stripped text content of an element, or None if absent."""
    return elem.text.strip() if elem is not None and elem.text else None


def _find(
    elem: ET.Element, path: str, ns: dict[str, str]
) -> ET.Element | None:
    """
    Namespace-aware find with automatic fallback to bare tag names.

    WMS 1.3.0 requires the 'wms:' prefix in every XPath step.
    WMS 1.1.1 documents have no namespace, so the prefixed lookup
    returns None and the fallback strips prefixes for a second attempt.
    """
    result = elem.find(path, ns) if ns else None
    if result is None:
        bare = "/".join(part.split(":")[-1] for part in path.split("/"))
        result = elem.find(bare)
    return result


def _findall(
    elem: ET.Element, path: str, ns: dict[str, str]
) -> list[ET.Element]:
    """Namespace-aware findall with bare-tag fallback."""
    results = elem.findall(path, ns) if ns else []
    if not results:
        bare = "/".join(part.split(":")[-1] for part in path.split("/"))
        results = elem.findall(bare)
    return results


def _parse_layer(elem: ET.Element, ns: dict[str, str]) -> dict[str, Any]:
    """
    Recursively parse a WMS <Layer> element.

    Returns a dict with name, title, abstract, crs (list), bbox (dict),
    and a nested layers list for group layers.
    """
    layer: dict[str, Any] = {
        "name":     _safe_text(_find(elem, "wms:Name", ns)),
        "title":    _safe_text(_find(elem, "wms:Title", ns)),
        "abstract": _safe_text(_find(elem, "wms:Abstract", ns)),
        "crs":      [],
        "bbox":     {},
        "layers":   [],
    }

    # WMS 1.3.0 uses <CRS>; WMS 1.1.1 uses <SRS>. Collect both to be safe.
    for tag in ("wms:CRS", "wms:SRS"):
        for crs_elem in _findall(elem, tag, ns):
            text = _safe_text(crs_elem)
            if text:
                layer["crs"].append(text)

    # EX_GeographicBoundingBox (WMS 1.3.0): child text elements, always lon/lat
    geo_bb = _find(elem, "wms:EX_GeographicBoundingBox", ns)
    if geo_bb is not None:
        layer["bbox"]["geographic"] = {
            "west":  float(_safe_text(_find(geo_bb, "wms:westBoundLongitude", ns)) or 0),
            "east":  float(_safe_text(_find(geo_bb, "wms:eastBoundLongitude", ns)) or 0),
            "south": float(_safe_text(_find(geo_bb, "wms:southBoundLatitude", ns)) or 0),
            "north": float(_safe_text(_find(geo_bb, "wms:northBoundLatitude", ns)) or 0),
        }

    # LatLonBoundingBox (WMS 1.1.1): XML attributes, always lon/lat
    latlon_bb = _find(elem, "wms:LatLonBoundingBox", ns)
    if latlon_bb is not None:
        layer["bbox"]["latlon"] = {
            "minx": float(latlon_bb.get("minx", 0)),
            "maxx": float(latlon_bb.get("maxx", 0)),
            "miny": float(latlon_bb.get("miny", 0)),
            "maxy": float(latlon_bb.get("maxy", 0)),
        }

    # Recurse into nested group layers
    for child in _findall(elem, "wms:Layer", ns):
        layer["layers"].append(_parse_layer(child, ns))

    return layer


def parse_wms_capabilities(base_url: str, timeout: int = 30) -> dict[str, Any]:
    """
    Fetch a WMS GetCapabilities document and return structured metadata.

    VERSION is intentionally omitted from the initial request so the server
    responds with its highest supported version rather than being pinned to
    a specific version by the client.
    """
    params = {"SERVICE": "WMS", "REQUEST": "GetCapabilities"}
    response = requests.get(base_url, params=params, timeout=timeout)
    response.raise_for_status()

    root = ET.fromstring(response.content)
    version = root.get("version", "1.3.0")
    ns = WMS_130_NS if version.startswith("1.3") else WMS_111_NS

    service_info = {
        "title":    _safe_text(_find(root, "wms:Service/wms:Title", ns)),
        "abstract": _safe_text(_find(root, "wms:Service/wms:Abstract", ns)),
        "version":  version,
    }

    capability = _find(root, "wms:Capability", ns)
    root_layer = (
        _find(capability, "wms:Layer", ns) if capability is not None else None
    )

    return {
        "service":    service_info,
        "root_layer": _parse_layer(root_layer, ns) if root_layer is not None else {},
    }


if __name__ == "__main__":
    # Replace with any public or local WMS endpoint
    result = parse_wms_capabilities("https://demo.mapserver.org/cgi-bin/wms")
    print(json.dumps(result, indent=2))

Step-by-Step Walkthrough

HTTP retrieval. requests.get() appends SERVICE=WMS and REQUEST=GetCapabilities as query parameters. Omitting the VERSION parameter is deliberate — the server then returns a capabilities document for its highest supported version instead of being locked to a version the client guesses. The timeout argument prevents thread starvation when querying unresponsive servers, which is common with public geospatial infrastructure under heavy tile load.

Version detection and namespace selection. ET.fromstring() parses the raw response bytes. The version attribute on the root element (WMT_MS_Capabilities for 1.1.1, WMS_Capabilities for 1.3.0) determines which namespace dictionary to use. WMS_130_NS maps the wms prefix to http://www.opengis.net/wms; WMS_111_NS is intentionally empty, triggering the bare-tag fallback in every lookup.

The _find and _findall helpers. ElementTree.find() requires exact namespace-qualified tag names when elements live in a namespace. Passing WMS_130_NS and writing XPath as wms:Capability/wms:Layer works for 1.3.0. For 1.1.1 documents, the prefixed lookup returns None, and the helpers strip the wms: prefix from each path segment before retrying with bare tag names like Capability/Layer.

Safe text extraction. _safe_text() guards against AttributeError from elements whose .text property is None — a common occurrence with optional fields like <Abstract> or <KeywordList>. The float(...or 0) pattern in the bounding box extraction prevents ValueError when empty bounding box child elements appear in malformed responses.

Recursive layer traversal. WMS capabilities documents nest layers to arbitrary depth — a root <Layer> typically contains named child layers, which may themselves contain sub-layers for thematic groupings. _parse_layer() recurses by calling _findall(elem, "wms:Layer", ns) on each parent, building a tree that preserves parent-child relationships. Map clients and catalog indexers require this structure to reconstruct layer group hierarchies.

Verification

Run the script against a known endpoint and inspect the output. The OGC Testbed operates several public WMS endpoints suitable for integration testing.

python parse_wms.py

Expected output shape (truncated):

{
  "service": {
    "title": "WMS Demo Server",
    "abstract": "Demonstration WMS service",
    "version": "1.3.0"
  },
  "root_layer": {
    "name": null,
    "title": "WMS Demo",
    "abstract": null,
    "crs": ["EPSG:4326", "EPSG:3857"],
    "bbox": {
      "geographic": {
        "west": -180.0,
        "east": 180.0,
        "south": -90.0,
        "north": 90.0
      }
    },
    "layers": [
      {
        "name": "cities",
        "title": "World Cities",
        "crs": ["EPSG:4326", "EPSG:3857", "EPSG:32634"],
        "bbox": { "geographic": { "west": -179.9, "east": 179.9, "south": -55.0, "north": 83.6 } },
        "layers": []
      }
    ]
  }
}

Confirm that service.version matches the endpoint’s advertised version, that root_layer.crs lists at least one recognized EPSG code, and that bounding box coordinates fall within plausible geographic ranges.

Gotchas and Edge Cases

Empty namespace on WMS 1.3.0 documents from certain server stacks. A small number of GeoServer and MapServer builds emit WMS 1.3.0-conformant XML but omit the xmlns attribute on the root element, effectively producing a namespace-free 1.3.0 document. The version-based namespace selector will choose WMS_130_NS, and every prefixed lookup will fail. If the result dictionary comes back empty despite a valid HTTP 200 response, check whether the root element carries xmlns="http://www.opengis.net/wms". If it does not, force ns = WMS_111_NS regardless of the reported version.

WMS 1.3.0 axis order in GetMap requests. This parser extracts raw geographic bounding box coordinates. If you use those coordinates to build a subsequent GetMap request targeting EPSG:4326 on a 1.3.0 server, you must reverse the axis order to minLat,minLon,maxLat,maxLon. Sending longitude-first produces a blank image or a InvalidBBox ServiceException with no further detail. The SRS and Coordinate Reference System Handling guide covers the detection logic for which CRS codes require swapping.

Large capabilities documents from servers with hundreds of layers. Some operational servers expose thousands of named layers, producing capabilities documents that exceed 10 MB. ET.fromstring() loads the entire document into memory. For very large payloads, use ET.iterparse() with a streaming HTTP response (stream=True on the requests.get() call) and yield layers as they are parsed rather than accumulating the full tree.

Servers that inject custom namespace prefixes. Some servers add non-standard prefixes — ms:, ogc:, or vendor-specific ones — on elements that should be in the WMS namespace. The bare-tag fallback in _find() handles the most common case, but deeply nested vendor extensions may still be missed. When the parsed output is missing expected sub-elements, print root.tag and inspect the raw XML with ET.tostring(root, encoding="unicode") to confirm the actual namespace declarations in use.


Back to Understanding OGC Web Map Service Specifications

Related