Python Script to Auto-Publish Shapefiles to a GeoServer Workspace
TL;DR
To auto-publish a shapefile to GeoServer you need exactly three REST API calls: a workspace POST, a binary PUT of the zipped shapefile to the file.shp endpoint, and a PUT to create the featureType resource with coordinate reference system metadata. The script below handles all three steps, enforces idempotency so re-runs are safe, and validates the zip archive before touching the network.
Core Challenge
The non-trivial part is not the HTTP calls — it is the sequencing and the CRS contract. GeoServer’s REST API requires the workspace to exist before a datastore can be registered, the datastore to exist before a feature type can be created, and the feature type to carry an explicit nativeCRS and srs field even when a .prj file is present in the archive. Skip any of these steps and you get either a 404 or a layer that appears in GeoServer’s admin UI but exposes the wrong spatial reference in its WMS and WFS GetCapabilities documents.
A second hazard is the zip archive itself. GeoServer’s file.shp upload endpoint requires the .shp, .shx, .dbf, and .prj components to sit at the root of the archive — not inside a subdirectory. If you hand the endpoint a zip produced by right-clicking a folder in a file manager, the contents are typically one level deep and the upload silently succeeds while GeoServer registers an empty datastore.
The diagram below shows the three-phase ingestion flow and where each failure mode lives.
Compatibility and Prerequisites
- Python: 3.10+ — the script uses
match/case-compatible type hints andpathlib.Paththroughout. - GeoServer: 2.20 or later. The
file.shpbinary upload endpoint and the JSON featureTypePUTresponse format stabilised at 2.20; older versions have unreliable behaviour on binaryPUT. - Dependencies:
pip install requests— no other third-party packages needed. - Input format: A
.ziparchive containing.shp,.shx,.dbf, and.prjflat at the root. GeoServer reads the.prjfile to detect the native coordinate reference system. Without it thenativeCRSin the API payload becomes the authoritative source. - Authentication: HTTP Basic Auth over TLS is shown below. GeoServer 2.24+ also supports bearer token auth — replace
HTTPBasicAuthwith{"Authorization": "Bearer <token>"}insession.headers.
Production-Ready Code
import zipfile
import requests
from requests.auth import HTTPBasicAuth
from pathlib import Path
from typing import Optional
REQUIRED_SHAPEFILE_EXTENSIONS = {".shp", ".shx", ".dbf", ".prj"}
class GeoServerShapefilePublisher:
"""
Publishes a zipped shapefile to a GeoServer workspace via the REST API.
All operations are idempotent: re-running against an existing workspace,
datastore, or layer is a no-op, not an error.
"""
def __init__(self, geoserver_url: str, username: str, password: str) -> None:
self.base_url = geoserver_url.rstrip("/")
self.session = requests.Session()
self.session.auth = HTTPBasicAuth(username, password)
self.session.headers.update({"Accept": "application/json"})
# ------------------------------------------------------------------
# Internal helpers
# ------------------------------------------------------------------
def _url(self, endpoint: str) -> str:
return f"{self.base_url}/rest/{endpoint.lstrip('/')}"
def _exists(self, endpoint: str) -> bool:
"""Return True when the REST resource already exists (HTTP 200)."""
return self.session.get(self._url(endpoint)).status_code == 200
def _put(self, endpoint: str, **kwargs) -> requests.Response:
resp = self.session.put(self._url(endpoint), **kwargs)
if resp.status_code >= 400:
raise RuntimeError(
f"GeoServer PUT {endpoint} [{resp.status_code}]: {resp.text[:400]}"
)
return resp
def _post(self, endpoint: str, **kwargs) -> requests.Response:
resp = self.session.post(self._url(endpoint), **kwargs)
if resp.status_code >= 400:
raise RuntimeError(
f"GeoServer POST {endpoint} [{resp.status_code}]: {resp.text[:400]}"
)
return resp
# ------------------------------------------------------------------
# Public API
# ------------------------------------------------------------------
def validate_archive(self, zip_path: Path) -> None:
"""
Raise ValueError when mandatory shapefile components are absent
or when components are nested inside a subdirectory in the archive.
GeoServer requires them at the archive root.
"""
if not zip_path.exists():
raise FileNotFoundError(f"Archive not found: {zip_path}")
with zipfile.ZipFile(zip_path, "r") as zf:
names = zf.namelist()
# Only entries with no directory separator are at the root
root_exts = {Path(n).suffix.lower() for n in names if "/" not in n}
missing = REQUIRED_SHAPEFILE_EXTENSIONS - root_exts
if missing:
raise ValueError(
f"Archive {zip_path.name} is missing required components at root: "
f"{missing}. Found root extensions: {root_exts}"
)
def ensure_workspace(self, workspace: str) -> None:
"""Create the workspace namespace when it does not yet exist."""
if self._exists(f"workspaces/{workspace}"):
return
self._post(
"workspaces",
json={"workspace": {"name": workspace}},
headers={"Content-Type": "application/json"},
)
def upload_shapefile(
self, workspace: str, datastore: str, zip_path: Path
) -> None:
"""
Stream the zipped shapefile to GeoServer's binary upload endpoint.
GeoServer extracts the archive, registers the shapefile as a vector
datastore, and reads the .prj file for the native CRS.
"""
self._put(
f"workspaces/{workspace}/datastores/{datastore}/file.shp",
data=zip_path.read_bytes(),
headers={"Content-Type": "application/zip"},
)
def publish_layer(
self,
workspace: str,
datastore: str,
layer_name: str,
title: Optional[str] = None,
abstract: Optional[str] = None,
srs: str = "EPSG:4326",
) -> None:
"""
Register the featureType resource so the layer appears in WMS/WFS
GetCapabilities. GeoServer overrides nativeCRS when a .prj is present,
but srs must be explicitly set for OGC compliance.
"""
endpoint = (
f"workspaces/{workspace}/datastores/{datastore}"
f"/featuretypes/{layer_name}.json"
)
if self._exists(
f"workspaces/{workspace}/datastores/{datastore}"
f"/featuretypes/{layer_name}"
):
return # already published — idempotent no-op
payload = {
"featureType": {
"name": layer_name,
"title": title or layer_name.replace("_", " ").title(),
"abstract": abstract or f"Auto-published layer: {layer_name}",
"nativeCRS": srs, # overridden by .prj when present
"srs": srs,
"metadataLinks": [
{
"type": "text/html",
"metadataType": "ISO19115:2003",
"content": (
f"https://example.org/metadata/{layer_name}"
),
}
],
}
}
self._put(
endpoint,
json=payload,
headers={"Content-Type": "application/json"},
)
def publish(
self,
workspace: str,
datastore: str,
zip_path: str | Path,
layer_name: Optional[str] = None,
srs: str = "EPSG:4326",
) -> str:
"""
Orchestrate the full ingestion pipeline.
Returns the canonical layer name that was published.
"""
zip_file = Path(zip_path)
name = layer_name or zip_file.stem
self.validate_archive(zip_file)
self.ensure_workspace(workspace)
self.upload_shapefile(workspace, datastore, zip_file)
self.publish_layer(workspace, datastore, name, srs=srs)
return name
# ---------------------------------------------------------------------------
# Entry point
# ---------------------------------------------------------------------------
if __name__ == "__main__":
publisher = GeoServerShapefilePublisher(
geoserver_url="https://geoserver.example.com/geoserver",
username="admin",
password="geoserver", # rotate via environment variable in production
)
published = publisher.publish(
workspace="agency_data",
datastore="transport_networks",
zip_path="./data/highways_2024.zip",
srs="EPSG:27700", # British National Grid — override when .prj absent
)
print(f"Published: {published}")
Step-by-Step Walkthrough
validate_archive
validate_archive opens the zip with Python’s standard zipfile module and builds the set of file extensions found at the archive root — entries with no / in their name. It computes the difference against REQUIRED_SHAPEFILE_EXTENSIONS and raises ValueError with an actionable message before a single byte is sent to GeoServer. This is the cheapest gate: catching a nested-directory archive here avoids a 201 response that registers an empty, broken datastore.
ensure_workspace
_exists issues a GET to /rest/workspaces/{workspace} and checks for HTTP 200. When the workspace is absent (404), _post sends a minimal JSON body {"workspace": {"name": workspace}} to /rest/workspaces. The response is a 201 with a Location header, which the script intentionally discards — the workspace name itself is the stable identifier for subsequent calls. Re-running this step against an existing workspace skips the POST entirely, satisfying the idempotency requirement.
upload_shapefile
upload_shapefile reads the entire zip into memory with Path.read_bytes() and issues a PUT to /rest/workspaces/{ws}/datastores/{ds}/file.shp. The Content-Type: application/zip header is mandatory — without it GeoServer interprets the body as a URL reference instead of binary data and returns a confusing 500. On success, GeoServer extracts the archive, stores the shapefile on its data directory, and creates the datastore entry. The .prj file is parsed at this point and stored as the native CRS in the internal catalogue.
For archives larger than ~200 MB, replace zip_path.read_bytes() with open(zip_path, "rb") as a streaming body to avoid loading the entire file into memory:
with open(zip_path, "rb") as fh:
self._put(endpoint, data=fh, headers={"Content-Type": "application/zip"})
publish_layer
publish_layer registers the feature type resource via a PUT to /rest/workspaces/{ws}/datastores/{ds}/featuretypes/{name}.json. The srs field (the declared CRS exposed in WMS GetCapabilities and WFS DescribeFeatureType) must match a code in GeoServer’s EPSG registry. The nativeCRS field is what GeoServer uses internally for reprojection — it is overridden by the .prj content when a valid projection string is present, so setting it in the payload acts as a fallback rather than an override.
The metadataLinks array adds an ISO 19115:2003 reference to the layer’s capabilities document. While optional for basic publication, it is expected by agency consumers relying on OGC-compliant metadata discovery. The SRS and Coordinate Reference System Handling guide explains how GeoServer resolves CRS mismatches when nativeCRS and srs differ.
publish (orchestrator)
publish sequences the four steps and returns the canonical layer name, which the caller can use to construct WMS/WFS request URLs or pass to downstream style-assignment scripts as described in Automating GeoServer with Python REST API.
Verification
After running the script, confirm the layer is live by issuing a WMS GetCapabilities request and grepping for the layer name:
curl -s "https://geoserver.example.com/geoserver/agency_data/wms?SERVICE=WMS&VERSION=1.3.0&REQUEST=GetCapabilities" \
| grep -A3 "highways_2024"
Expected output (abbreviated):
<Layer queryable="1">
<Name>highways_2024</Name>
<Title>Highways 2024</Title>
<CRS>EPSG:27700</CRS>
If the layer name is absent, the feature type registration failed silently. Check /rest/workspaces/{ws}/datastores/{ds}/featuretypes.json to confirm the resource exists, then inspect the GeoServer log at $GEOSERVER_DATA_DIR/logs/geoserver.log for the actual cause.
To verify the WFS endpoint independently:
curl -s "https://geoserver.example.com/geoserver/agency_data/wfs?SERVICE=WFS&VERSION=2.0.0&REQUEST=GetCapabilities" \
| grep "highways_2024"
Both WMS and WFS must return the layer for a complete publication.
Gotchas and Edge Cases
Nested archive structure. Packaging a shapefile by compressing a folder produces a zip where every member has a directory prefix (e.g., highways_2024/highways_2024.shp). GeoServer’s file.shp endpoint cannot locate the components and silently registers an empty datastore. Always verify with zipfile.ZipFile.namelist() before uploading, as validate_archive does.
Missing .prj and silent CRS drift. When the .prj file is absent, GeoServer assigns the nativeCRS from the API payload as both the storage and declared CRS. If that default (EPSG:4326) differs from the geometry’s actual projection, every WMS tile and WFS feature will be spatially misplaced without any error being raised. Enforce .prj presence in REQUIRED_SHAPEFILE_EXTENSIONS or integrate pyproj to validate the projection string before upload.
Datastore name collision (409 Conflict). A PUT to file.shp on an existing datastore name returns 409 Conflict. The script’s upload_shapefile method does not pre-check for datastore existence — it delegates that decision to the caller. If you are replacing an existing dataset, call DELETE /rest/workspaces/{ws}/datastores/{ds}?recurse=true before upload_shapefile. The environment parity workflows for staging-to-production promotion cover this replace-and-republish pattern in detail.
Timeout on large archives. The default requests timeout is None (infinite). For archives above 100 MB add timeout=(5, 120) to the PUT call — a 5-second connect timeout and a 120-second read timeout. This prevents the script from hanging indefinitely when the GeoServer host is overloaded.
Back to Automating GeoServer with Python REST API
Related
- Python Automation for GeoServer & MapServer — the broader automation context covering raster publishing, style management, and cache warming
- Environment Parity for Spatial Servers — staging-to-production promotion patterns, including replace-and-republish workflows
- SRS and Coordinate Reference System Handling — how GeoServer resolves
nativeCRSvssrsconflicts and on-the-fly reprojection