Python API reference
Client
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
api_key |
str \| None |
None |
Your vs_... key. Falls back to EOLAS_API_KEY env var. |
base_url |
str |
"https://api.eolas.fyi" |
Override for testing. |
cache |
bool |
False |
Cache responses in memory for the lifetime of the client. |
Source-specific methods
All source methods accept the same parameters as client.get() and return a Dataset tagged with the source label.
| Method | Source |
|---|---|
client.statsnz(name, **kwargs) |
Stats NZ |
client.oecd(name, **kwargs) |
OECD |
client.rbnz(name, **kwargs) |
RBNZ |
client.treasury(name, **kwargs) |
NZ Treasury |
client.linz(name, **kwargs) |
LINZ |
client.statsnz_geo(name, **kwargs) |
Stats NZ Geospatial |
client.mbie(name, **kwargs) |
MBIE |
client.nzta(name, **kwargs) |
Waka Kotahi (NZTA) |
client.msd(name, **kwargs) |
MSD |
client.police(name, **kwargs) |
NZ Police / MoJ |
client.immigration(name, **kwargs) |
Immigration NZ |
client.lris(name, **kwargs) |
Manaaki Whenua / LRIS |
client.geonet(name, **kwargs) |
GeoNet |
client.doc(name, **kwargs) |
DOC (Department of Conservation) |
client.akl_council(name, **kwargs) |
Auckland Council |
client.akl_transport(name, **kwargs) |
Auckland Transport |
client.bay_of_plenty(name, **kwargs) |
Bay of Plenty Councils |
client.charities(name, **kwargs) |
Charities Services |
client.colab_waikato(name, **kwargs) |
Co-Lab Waikato |
client.ecan_canterbury(name, **kwargs) |
ECan / Canterbury |
client.eeca(name, **kwargs) |
EECA (energy use, EV chargers, regional heat demand) |
client.hawkes_bay(name, **kwargs) |
Hawke's Bay Councils |
client.manawatu_whanganui(name, **kwargs) |
Manawatū-Whanganui Councils |
client.napier_whanganui(name, **kwargs) |
Napier + Whanganui |
client.northland(name, **kwargs) |
Northland Councils |
client.otago(name, **kwargs) |
Otago Councils |
client.pharmac(name, **kwargs) |
PHARMAC |
client.southland(name, **kwargs) |
Southland Councils |
client.taranaki(name, **kwargs) |
Taranaki Councils |
client.top_of_south(name, **kwargs) |
Gisborne / Top of South Councils |
client.wellington(name, **kwargs) |
Wellington Region Councils |
client.west_coast(name, **kwargs) |
West Coast (Te Tai o Poutini) |
client.list(source=None)
Return metadata for all available series.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
source |
str \| None |
None |
Filter by source label, e.g. "Stats NZ", "OECD". |
Returns: list[dict]
client.info(name)
Return metadata for a single series.
meta = client.info("nz_cpi")
# {"name": "nz_cpi", "title": "NZ Consumer Price Index", "source": "Stats NZ", ...}
Parameters
| Name | Type | Description |
|---|---|---|
name |
str |
Series identifier, e.g. "nz_cpi" |
Returns: dict
Raises: NotFoundError if the series does not exist.
client.get(name, start=None, end=None, format="json", engine="pandas", limit=None, as_geo=None, as_arrow=False)
Fetch dataset rows as a DataFrame. For everyday use prefer the source-specific methods above — they call get() internally.
df = client.get("nz_cpi", start="2020-01-01", end="2024-12-31")
df = client.get("nz_cpi", engine="polars") # returns polars DataFrame
df = client.get("nz_addresses", limit=1000)
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
name |
str |
— | Dataset identifier |
start |
str \| None |
None |
ISO date lower bound, e.g. "2020-01-01". |
end |
str \| None |
None |
ISO date upper bound. |
format |
str |
"json" |
"json" or "csv". You don't need to set this for speed — the client transparently negotiates Apache Arrow over the wire for the DataFrame path (see Performance). |
engine |
str |
"pandas" |
"pandas" or "polars" |
limit |
int \| None |
None |
Max rows to return. None requests the full dataset. Free plan is capped server-side at 50,000 rows; Pro is unlimited. |
as_geo |
bool \| None |
None |
Return a geopandas.GeoDataFrame for geospatial datasets. None auto-converts when geometry is present and geopandas is importable. True forces conversion (errors if missing). False keeps the raw geometry_wkt string column. Install with pip install eolas-data[geo]. Mutually exclusive with as_arrow=True. |
as_arrow |
bool |
False |
Return a pyarrow.Table instead of a DataFrame or GeoDataFrame. Skips all shapely allocation — geometry stays as Arrow buffers. Works on all datasets and all source helpers. Mutually exclusive with as_geo=True. |
Returns: Dataset (pandas), polars.DataFrame when engine="polars", geopandas.GeoDataFrame when geometry is present, or pyarrow.Table when as_arrow=True
Raises: NotFoundError, AuthenticationError, RateLimitError
Performance: Arrow & Parquet
The API serves datasets in four formats via ?format= — json (default), csv, arrow (Apache Arrow IPC stream), and parquet. Arrow and Parquet are columnar and typed, so they're dramatically faster for anything beyond a few hundred rows. Measured end-to-end on a 100,000-row × 71-column dataset:
| Format | Wire size | Total (download + parse) |
|---|---|---|
| JSON | 165 MB | 39.5 s |
| Arrow | 66 MB | 7.7 s (5× faster; ~80× faster parse) |
| Parquet | 6.3 MB | 4.3 s (9× faster; 26× smaller) |
The Python client uses Arrow automatically — client.get("nz_cpi") returns the same DataFrame, just much faster on large pulls, with a transparent JSON fallback. Hitting the REST API directly:
curl -H "X-API-Key: $EOLAS_API_KEY" \
"https://api.eolas.fyi/v1/datasets/nzta_cas_crashes/data?format=parquet&limit=100000" \
-o crashes.parquet
client.download_bulk(name, *, freshness="auto", format="parquet", path=None, progress=None)
Download a complete dataset as a single binary file via the /v1/bulk/{namespace}/{table} endpoint. Monthly snapshots are served from Cloudflare's edge cache; Pro current snapshots are lazy-generated on first request.
See Bulk downloads for the full narrative, tier comparison, and worked examples.
import io, pandas as pd
# Return raw bytes
raw = client.download_bulk("nz_cpi")
df = pd.read_parquet(io.BytesIO(raw))
# Write to a file, get the path back
path = client.download_bulk("nz_cpi", path="nz_cpi.parquet")
# Gzipped CSV
client.download_bulk("nz_cpi", format="csv_gz", path="nz_cpi.csv.gz")
# GeoParquet for a geospatial dataset
import geopandas as gpd
raw = client.download_bulk("territorial_authority_2023", format="geoparquet")
gdf = gpd.read_parquet(io.BytesIO(raw))
# Force monthly freshness (reproducibility across plan levels)
client.download_bulk("nz_cpi", freshness="monthly", path="nz_cpi.parquet")
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
name |
str |
— | Dataset identifier, e.g. "nz_cpi" |
freshness |
str |
"auto" |
"auto" — server picks based on plan (Free→monthly, Pro→current). "monthly" or "current" to override. |
format |
str |
"parquet" |
"parquet", "csv_gz", or "geoparquet". GeoParquet only available on geospatial datasets. |
path |
str \| Path \| None |
None |
Write to this path and return the resolved Path. None returns raw bytes. Parent directories are created automatically. |
progress |
bool \| None |
None |
Control the download progress bar. None auto-detects: shown when sys.stdout.isatty() is True (terminal or VSCode notebook), hidden otherwise. True forces the bar on; False forces it off. Also suppressed by EOLAS_NO_PROGRESS=1 env var. When path=None (bytes mode) progress is always disabled. |
Returns: pathlib.Path when path is set; bytes when path is None.
Raises:
| Exception | When |
|---|---|
BulkUpgradeRequired |
HTTP 402 — freshness="current" requires Pro plan |
BulkLicenceRestricted |
HTTP 403 (licence body) — dataset excluded from bulk (e.g. OECD). Use client.get() instead. |
BulkNotYetAvailable |
HTTP 503 — monthly snapshot not yet generated |
NotFoundError |
Dataset not found |
AuthenticationError |
Invalid or missing API key |
from eolas_data.exceptions import BulkUpgradeRequired, BulkLicenceRestricted
try:
client.download_bulk("nz_cpi", freshness="current")
except BulkUpgradeRequired:
print("Upgrade to Pro for current snapshots: https://eolas.fyi/pricing")
client.sync_bulk(name, *, path, format="parquet", freshness="auto", progress=None)
Incrementally sync a bulk dataset file — only re-downloads when the snapshot changes.
Issues a lightweight HEAD request to read the server's X-Snapshot-Version header. If the local sidecar records the same snapshot id and the file exists, returns immediately with status="unchanged" and zero data I/O. Otherwise downloads the new snapshot and replaces the file atomically (os.replace()).
A sidecar file <path>.eolas-meta.json is written next to the data file on every download or update, recording the snapshot id, timestamp, and source URL.
from eolas_data import Client, SyncResult
client = Client("your_api_key")
# First call: full download
r = client.sync_bulk("nz_cpi", path="nz_cpi.parquet")
print(r.status) # "downloaded"
print(r.bytes_downloaded) # e.g. 2_100_000
# Subsequent calls: no-op when snapshot unchanged
r = client.sync_bulk("nz_cpi", path="nz_cpi.parquet")
print(r.status) # "unchanged"
print(r.bytes_downloaded) # 0
# After a new ETL run: file replaced in place
r = client.sync_bulk("nz_cpi", path="nz_cpi.parquet")
print(r.status) # "updated"
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
name |
str |
— | Dataset identifier, e.g. "nz_cpi" |
path |
str \| Path |
— | Required. Where to write the data file. Sidecar lives at f"{path}.eolas-meta.json". |
format |
str |
"parquet" |
"parquet", "csv_gz", or "geoparquet". |
freshness |
str |
"auto" |
"auto", "monthly", or "current". |
progress |
bool \| None |
None |
Control the download progress bar. None auto-detects via sys.stdout.isatty(). True forces the bar on; False forces it off. EOLAS_NO_PROGRESS=1 env var globally suppresses. When status="unchanged" no bar is shown regardless (no data transferred). |
Returns: SyncResult — a dataclass with fields:
| Field | Type | Description |
|---|---|---|
status |
str |
"downloaded", "updated", or "unchanged" |
previous_snapshot_id |
str \| None |
Snapshot id from the sidecar before sync, or None if no sidecar existed |
current_snapshot_id |
str |
Snapshot id from the server |
path |
pathlib.Path |
Resolved path to the data file |
bytes_downloaded |
int |
Bytes written (0 when unchanged) |
Raises: Same as download_bulk (BulkUpgradeRequired, BulkLicenceRestricted, BulkNotYetAvailable, NotFoundError, AuthenticationError). No sidecar is written on error.
client.get_local(name, *, cache_dir="~/.cache/eolas", format=None, freshness="auto", as_geo=None, as_arrow=False)
Fetch a dataset from the local cache, downloading it from the bulk endpoint if not already present or if the snapshot has changed. Use this when you want to avoid re-downloading data that hasn't changed.
On the first call it fetches the bulk file from CDN and writes it to ~/.cache/eolas/. On subsequent calls a lightweight HEAD request checks whether the file is still current; if so the local copy is read directly with zero data transfer.
from eolas_data import Client
client = Client("your_api_key")
# Geospatial dataset — first call downloads from CDN; subsequent calls read locally
gdf = client.get_local("nz_parcels") # geopandas.GeoDataFrame (if geopandas installed)
# Non-geo dataset
df = client.get_local("nz_cpi") # pd.DataFrame
# Custom cache directory
df = client.get_local("nz_cpi", cache_dir="/data/eolas-cache")
# Force a specific format
df = client.get_local("nz_cpi", format="csv_gz")
# Keep raw WKB column instead of converting to GeoDataFrame
df = client.get_local("nz_parcels", as_geo=False)
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
name |
str |
— | Dataset identifier, e.g. "nz_parcels" |
cache_dir |
str \| Path \| None |
None |
Local directory for cached files. None (default) resolves via the library precedence chain (env → config → ~/.cache/eolas/). An explicit value wins outright. See Authentication → Library. |
format |
str \| None |
None |
"parquet", "csv_gz", or "geoparquet". None auto-detects from dataset metadata (geo → geoparquet, else parquet). |
freshness |
str |
"auto" |
"auto", "monthly", or "current". Passed verbatim to sync_bulk. |
as_geo |
bool \| None |
None |
When True and the file is GeoParquet and geopandas is installed, returns a GeoDataFrame. When False (or geopandas is missing), returns a plain DataFrame with the raw WKT column. None auto-converts when geometry is present and geopandas is importable (unless as_arrow=True). Mutually exclusive with as_arrow=True. |
as_arrow |
bool |
False |
Return a pyarrow.Table instead of a DataFrame or GeoDataFrame. Skips all shapely allocation. Mutually exclusive with as_geo=True. |
Returns: pd.DataFrame, geopandas.GeoDataFrame, or pyarrow.Table when as_arrow=True
Raises:
| Exception | When |
|---|---|
BulkUpgradeRequired |
HTTP 402 — freshness="current" requires Pro plan |
BulkLicenceRestricted |
HTTP 403 (licence body) — dataset excluded from bulk (e.g. OECD). Use client.get() instead. |
BulkNotYetAvailable |
HTTP 503 — monthly snapshot not yet generated |
NotFoundError |
Dataset not found |
AuthenticationError |
Invalid or missing API key |
Dataset
A pandas.DataFrame subclass returned by all data-fetching methods.
Extra attributes
| Attribute | Type | Description |
|---|---|---|
eolas_name |
str |
Series identifier, e.g. "nz_cpi" |
eolas_source |
str |
Source label, e.g. "Stats NZ" |
Plotting
Dataset subclasses DataFrame, so any matplotlib, seaborn, or plotly workflow works directly. plot_series() was removed in v1.3.0 — use pandas .plot() instead:
CLI auth commands
Manages the API key from the terminal. Requires pip install 'eolas-data[cli]'. For the OS-keyring commands, also install the secure extra:
eolas auth save-key [KEY]
Save the API key to the OS keyring (macOS Keychain, Windows Credential Manager, Linux Secret Service). Requires pip install 'eolas-data[secure]'.
eolas auth save-key # interactive masked prompt
eolas auth save-key vs_mykey # non-interactive (e.g. piped from a script)
Stored under service="eolas", username="api-key" — the same slot the R client uses, so a key saved from Python is immediately visible in R.
eolas auth clear-key
Remove the API key from the OS keyring. Does not affect EOLAS_API_KEY or the config file.
eolas auth status
Show the resolved API key (masked to first 8 characters) and which source supplied it. Checks all sources in precedence order: env var → OS keyring → config file.
eolas auth set-key
Save the API key to ~/.eolas/config.json (chmod 600) as a plaintext fallback. No extra install required.
eolas auth clear
Remove ~/.eolas/config.json. Does not affect the env var or keyring.
CLI library commands
Manage the directory where get_local() caches bulk data files. Requires pip install 'eolas-data[cli]'.
eolas library set [PATH]
Write library_dir to ~/.eolas/config.json. Future calls to get_local() will use this directory unless overridden by EOLAS_LIBRARY or an explicit cache_dir= argument.
eolas library set ~/eolas-library # user-wide persistent location
eolas library set /data/eolas # custom absolute path
eolas library set # interactive prompt if no arg given
The config file is shared with the R eolas client, so a path set here is immediately honoured in R.
eolas library status
Show the resolved library directory and which source is supplying it (env var, config file, or ~/.cache/eolas/ fallback).
eolas library clear
Remove library_dir from ~/.eolas/config.json. After clearing, get_local() reverts to the default ~/.cache/eolas/ location (or EOLAS_LIBRARY if set).
Exceptions
All exceptions inherit from eolas_data.exceptions.EolasError.
from eolas_data.exceptions import (
EolasError, # base
AuthenticationError, # 401 / 403
RateLimitError, # 429
NotFoundError, # 404
APIError, # other HTTP errors — has .status_code attribute
# Bulk-download-specific (subclass APIError)
BulkUpgradeRequired, # 402 — current freshness requires Pro
BulkLicenceRestricted, # 403 (licence body) — dataset excluded from bulk
BulkNotYetAvailable, # 503 — monthly snapshot not yet generated
)