Syncing PostgreSQL PostGIS Layers with GeoServer via Python

TL;DR: Call GET /rest/workspaces/{ws}, then POST or PUT to register a PostGIS datastore, then POST /rest/workspaces/{ws}/datastores/{store}/featuretypes with nativeCRS, srs, and projectionPolicy: REPROJECT_TO_DECLARED. Wrap the entire flow in environment-variable-driven Python and the result is an idempotent sync script that runs cleanly on every CI/CD deployment without duplicating resources or leaking connection pools.

The Core Challenge

GeoServer’s REST API for PostGIS datasources uses an uncommon JSON format — connection parameters are not a plain object but an entry-list of {"@key": "...", "$": "..."} pairs. Send a standard dict and GeoServer silently accepts the request with a 201 but never actually connects to PostgreSQL. There is no validation error; the datastore appears registered yet no features are visible.

A second trap is CRS handling. PostGIS stores each table’s coordinate system as an integer SRID in geometry_columns. GeoServer must receive that SRID as a declared srs field AND a matching nativeCRS field in the feature type payload. If either is absent or contradicts the stored SRID, GeoServer either refuses to publish or silently mis-declares the bounding box — a problem that only surfaces when a WMS client requests tiles and receives spatially offset results. The SRS and Coordinate Reference System Handling guide explains on-the-fly reprojection and why a correctly declared projectionPolicy is the reliable fix.

The diagram below shows the three-phase call sequence the sync script follows:

PostGIS to GeoServer REST sync sequence Three-phase idempotent sync: phase 1 checks and creates the workspace, phase 2 registers or updates the PostGIS datastore, phase 3 publishes the feature type with CRS enforcement. Python Script GeoServer REST PostGIS (JDBC) PHASE 1 — WORKSPACE GET /rest/workspaces/{ws} 404 Not Found POST /rest/workspaces 201 Created PHASE 2 — DATASTORE GET /rest/…/datastores/{s} 404 → POST (create) JDBC connection test OK / driver error PHASE 3 — FEATURE TYPE POST featuretypes (nativeCRS) 201 layer published GET ?request=GetCapabilities WMS XML (layer present)

Prerequisites & Compatibility

Verify these constraints before running the sync to prevent silent connection drops, JDBC driver mismatches, or OGC non-compliance:

Component Minimum Version Critical Notes
Python 3.10+ Requires requests and urllib3
GeoServer 2.20.0+ REST API v2+ natively supports JSON payloads
PostGIS 3.0+ geometry_columns and spatial_ref_sys must be populated
JDBC Driver postgresql-42.5.0.jar+ Must reside in WEB-INF/lib or data_dir/lib
OGC Services WMS 1.3.0 / WFS 2.0 Validate via GetCapabilities post-sync

GeoServer’s REST interface requires exact parameter casing. The host, port, database, user, and passwd keys must align with the JDBC URL schema. Mismatched SRS codes between the declared srs and the stored SRID will trigger on-the-fly reprojection overhead during every WMS tile request. Declare both nativeCRS and srs explicitly when publishing to avoid this penalty.

Production-Ready Code

The script below is a self-contained, idempotent PostGIS-to-GeoServer sync routine. It reads all sensitive values from environment variables, uses requests for HTTP transport, and never duplicates resources on repeated runs:

import os
import requests
import logging
from requests.auth import HTTPBasicAuth

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
logger = logging.getLogger("postgis_sync")

# Inject via CI/CD secret manager or .env — never hard-code credentials
GEOSERVER_URL = os.getenv("GEOSERVER_URL", "http://localhost:8080/geoserver")
AUTH = HTTPBasicAuth(
    os.getenv("GEOSERVER_USER", "admin"),
    os.getenv("GEOSERVER_PASS", "geoserver"),
)
HEADERS = {"Content-Type": "application/json", "Accept": "application/json"}


def _get(endpoint: str) -> requests.Response:
    """Raw GET — does NOT raise on 404; callers inspect status_code."""
    return requests.get(f"{GEOSERVER_URL}{endpoint}", auth=AUTH, headers=HEADERS)


def _post(endpoint: str, payload: dict) -> requests.Response:
    resp = requests.post(
        f"{GEOSERVER_URL}{endpoint}", auth=AUTH, headers=HEADERS, json=payload
    )
    resp.raise_for_status()
    return resp


def _put(endpoint: str, payload: dict) -> requests.Response:
    resp = requests.put(
        f"{GEOSERVER_URL}{endpoint}", auth=AUTH, headers=HEADERS, json=payload
    )
    resp.raise_for_status()
    return resp


def check_tcp(host: str, port: int, timeout: float = 3.0) -> bool:
    """Pre-flight connectivity check — prevents partial state commits."""
    import socket
    try:
        with socket.create_connection((host, port), timeout=timeout):
            return True
    except OSError:
        return False


def ensure_workspace(ws: str) -> None:
    """Create workspace if it does not already exist."""
    resp = _get(f"/rest/workspaces/{ws}")
    if resp.status_code == 404:
        _post("/rest/workspaces", {"workspace": {"name": ws}})
        logger.info("Created workspace: %s", ws)
    elif resp.status_code == 200:
        logger.info("Workspace '%s' already exists.", ws)
    else:
        resp.raise_for_status()


def sync_postgis_store(ws: str, db_config: dict) -> None:
    """Register or update a PostGIS datastore idempotently.

    GeoServer REST requires connection parameters as an entry-list of
    {"@key": "...", "$": "..."} pairs — plain dicts are silently ignored.
    """
    store_name = f"{ws}_postgis"
    store_payload = {
        "dataStore": {
            "name": store_name,
            "type": "PostGIS",
            "enabled": True,
            "connectionParameters": {
                "entry": [
                    {"@key": "dbtype",               "$": "postgis"},
                    {"@key": "host",                 "$": db_config["host"]},
                    {"@key": "port",                 "$": str(db_config["port"])},
                    {"@key": "database",             "$": db_config["database"]},
                    {"@key": "user",                 "$": db_config["user"]},
                    {"@key": "passwd",               "$": db_config["password"]},
                    {"@key": "schema",               "$": db_config.get("schema", "public")},
                    {"@key": "Expose primary keys",  "$": "true"},
                ]
            },
        }
    }

    check = _get(f"/rest/workspaces/{ws}/datastores/{store_name}")
    if check.status_code == 404:
        _post(f"/rest/workspaces/{ws}/datastores", store_payload)
        logger.info("Created PostGIS datastore: %s", store_name)
    else:
        check.raise_for_status()
        _put(f"/rest/workspaces/{ws}/datastores/{store_name}", store_payload)
        logger.info("Updated PostGIS datastore: %s", store_name)


def publish_layer(ws: str, table: str, srs: str) -> None:
    """Publish a feature type with explicit CRS enforcement.

    projectionPolicy REPROJECT_TO_DECLARED treats the declared SRS as
    authoritative, eliminating bbox misalignment in WMS tile responses.
    """
    store_name = f"{ws}_postgis"
    ft_endpoint = f"/rest/workspaces/{ws}/datastores/{store_name}/featuretypes"
    layer_payload = {
        "featureType": {
            "name":             table,
            "nativeCRS":        srs,   # must match geometry_columns SRID
            "srs":              srs,   # OGC-advertised CRS in GetCapabilities
            "projectionPolicy": "REPROJECT_TO_DECLARED",
            "enabled":          True,
        }
    }

    check = _get(f"{ft_endpoint}/{table}")
    if check.status_code == 404:
        _post(ft_endpoint, layer_payload)
        logger.info("Published layer: %s", table)
    else:
        check.raise_for_status()
        _put(f"{ft_endpoint}/{table}", layer_payload)
        logger.info("Updated layer metadata: %s", table)


def main() -> None:
    db_config = {
        "host":     os.getenv("PG_HOST",     "localhost"),
        "port":     os.getenv("PG_PORT",     "5432"),
        "database": os.getenv("PG_DB",       "spatial_db"),
        "user":     os.getenv("PG_USER",     "gis_user"),
        "password": os.getenv("PG_PASS",     "secure_password"),
        "schema":   os.getenv("PG_SCHEMA",   "public"),
    }
    workspace  = os.getenv("WORKSPACE",   "agency_spatial")
    table_name = os.getenv("TABLE_NAME",  "parcels_2024")
    srs        = os.getenv("SRS",         "EPSG:4326")

    pg_host = db_config["host"]
    pg_port = int(db_config["port"])
    if not check_tcp(pg_host, pg_port):
        raise RuntimeError(
            f"Cannot reach PostgreSQL at {pg_host}:{pg_port} — "
            "aborting to prevent partial state commit."
        )

    logger.info("Starting PostGIS → GeoServer sync...")
    ensure_workspace(workspace)
    sync_postgis_store(workspace, db_config)
    publish_layer(workspace, table_name, srs)
    logger.info("Sync complete.")


if __name__ == "__main__":
    main()

Step-by-Step Walkthrough

Pre-flight TCP check (check_tcp). Before issuing any REST calls, the script opens a raw TCP socket to the PostgreSQL host and port. If the connection fails, the script raises immediately rather than registering a datastore that points at an unreachable database. Without this guard, the workspace and datastore may be created successfully while the underlying PostGIS instance is offline — leaving GeoServer in a partially configured state that is difficult to diagnose from logs alone.

_get does not call raise_for_status. The private _get helper intentionally returns the raw response without raising. If it raised on 404, the first check in ensure_workspace would throw an exception before the create branch could execute. Only _post and _put call raise_for_status, because any non-2xx response from a write operation is always an unrecoverable error.

Entry-list format in sync_postgis_store. The connectionParameters.entry list uses @key/$ pairs — GeoServer’s documented JSON encoding for Java property maps. This is not standard JSON convention; it mirrors the XML attribute syntax the REST API originally used. Sending a plain {"host": "localhost"} object is accepted without error but never applied.

nativeCRS vs srs in publish_layer. nativeCRS tells GeoServer the coordinate system actually stored in PostGIS (the value from geometry_columns.srid). srs is the system GeoServer advertises in WMS GetCapabilities responses and uses when clients specify a requested CRS. Setting projectionPolicy to REPROJECT_TO_DECLARED makes the declared srs authoritative: GeoServer reprojects incoming features on read if nativeCRS and srs differ, rather than failing silently or exposing misaligned bounding boxes to downstream WMS clients.

Idempotency. Every write path is preceded by a _get check. A 404 triggers POST (create); a 200 triggers PUT (update). Running the script twice against an already-configured GeoServer updates configuration without duplicating datastores or creating duplicate connection pools.

Verification

After the script completes, query the WMS capabilities endpoint to confirm the layer is published:

curl -s -u admin:geoserver \
  "http://localhost:8080/geoserver/wms?service=WMS&version=1.3.0&request=GetCapabilities" \
  | grep -A5 "parcels_2024"

Expected output (abbreviated):

<Layer queryable="1">
  <Name>agency_spatial:parcels_2024</Name>
  <Title>parcels_2024</Title>
  <CRS>EPSG:4326</CRS>
  <BoundingBox CRS="EPSG:4326" minx="-180.0" miny="-90.0" maxx="180.0" maxy="90.0"/>
</Layer>

If the layer is absent, check the GeoServer logs at data_dir/logs/geoserver.log. A JDBC driver error appears as org.postgresql.Driver not found; a missing schema appears as relation "public.parcels_2024" does not exist.

Gotchas & Edge Cases

JDBC driver placement. GeoServer loads the PostgreSQL JDBC driver at startup. If postgresql-42.x.x.jar is absent from WEB-INF/lib or data_dir/lib, the datastore registration call returns 201 but GeoServer cannot open any connections. Place the jar and restart the service — a hot reload is not sufficient.

Non-public schemas. By default GeoServer’s PostGIS datastore scans only the public schema. Pass "schema": "parcels_schema" in db_config and confirm the running PostgreSQL user has USAGE privilege on that schema: GRANT USAGE ON SCHEMA parcels_schema TO gis_user;

SRS mismatch without error. If nativeCRS does not match the actual SRID in geometry_columns, GeoServer may accept the payload and generate a WMS layer whose bounding box is expressed in the wrong coordinate space. The symptom is WMS tiles that render in the wrong location. Always cross-check the declared SRS against the PostGIS SRID: SELECT srid FROM geometry_columns WHERE f_table_name = 'parcels_2024';

WFS transaction differences. This workflow publishes layers for read-only WMS access. If you need transactional write access via WFS-T operations, enable the wfs service on the workspace and confirm the PostGIS user has INSERT, UPDATE, and DELETE privileges on the target table.


Back to Environment Parity for Spatial Servers

Related