Python client

The eolas-data Python package provides a Client class that wraps the eolas REST API and returns Dataset objects — pandas DataFrames with source metadata.

Installation

pip install eolas-data

Optional extras:

pip install eolas-data[polars]      # polars output support
pip install matplotlib              # for plotting

Requires Python 3.10+ and pandas 1.5+.

Initialisation

from eolas_data import Client

# Pass key directly
client = Client("your_eolas_key")

# Read from EOLAS_API_KEY environment variable
client = Client()

# With in-memory cache (great for notebooks)
client = Client("your_eolas_key", cache=True)

Source-specific helpers

The recommended way to fetch data — the source is encoded in the method name, making code self-documenting and autocomplete-friendly:

df  = client.statsnz("nz_cpi", start="2020-01-01")   # Stats NZ
df  = client.oecd("nz_gdp_growth")                    # OECD
df  = client.rbnz("rbnz_b2_wholesale_rates_monthly")  # RBNZ
df  = client.treasury("treasury_fiscal_spending")     # NZ Treasury
gdf = client.linz("nz_parcels")  # LINZ (~3M rows — auto-bulks in seconds, no limit needed

Source-specific helpers call client.get() internally and inherit smart routing: large and geospatial datasets auto-route through the cache+sync path, so client.linz("nz_parcels") now returns a GeoDataFrame in seconds — not 15 minutes. The first call emits a one-line log explaining what happened; subsequent calls are silent.

For cases where you want to be explicit, use get_local() (same path, extra options for cache_dir / format / freshness), or pass mode="live" to hit the live Iceberg endpoint directly (useful for freshest data, OECD-restricted sources, or sliced queries with limit=/start=/end=):

# Explicit cache+sync path with extra control
gdf = client.get_local("nz_parcels")
gdf = client.get_local("nz_parcels", cache_dir="/data/eolas", freshness="monthly")

# Force live scan — note: server returns 413 if the dataset is large/geo
# and no limit=/start=/end= filter is set; apply a filter or use mode="cached"
gdf = client.get("nz_parcels", mode="live")

See Bulk downloads for the full routing rules and tier comparison.

Each returns a Dataset tagged with the source label.

Discovery

client.list()                    # all datasets — DataFrame
client.list("Stats NZ")          # filter by source
client.list_wellington()         # Wellington Region Councils only

client.search("HLFS")            # labour-force datasets (alias expansion)
client.search("OCR", source="RBNZ")
client.search("kapiti")          # → kcdc_* council layers
client.search("cpi")             # ranks rbnz_m1_prices before nz_cpi; prints CPI guidance

nz_cpi is OECD year-on-year % change, not a CPI index level — use rbnz_m1_prices for quarterly index levels.

Dataset

All data-fetching methods return a Dataset — a pandas DataFrame subclass with extra metadata:

df = client.statsnz("nz_cpi", start="2020-01-01")

print(df)
# Dataset: nz_cpi [Stats NZ]
# 20 rows
#          date  period   value
# 0  2020-01-01  2020Q1  1010.0
# ...

df.eolas_name    # "nz_cpi"
df.eolas_source  # "Stats NZ"

Plotting

Dataset is a pandas DataFrame subclass, so any matplotlib, seaborn, or plotly workflow works directly. For a quick line chart:

ax = df.plot(x="date", y="value")
ax.set_ylabel("Index (base 1000)")

Requires matplotlib: pip install matplotlib.

Cache

Pass cache=True to avoid re-fetching the same series in a notebook session:

client = Client("your_eolas_key", cache=True)

df1 = client.statsnz("nz_cpi")   # hits the API
df2 = client.statsnz("nz_cpi")   # returned from cache

Working with large geo datasets

The 5.4M-row linz.nz_parcels table allocates ~10 GB when materialised as a GeoDataFrame. Pass as_arrow=True to skip all shapely allocation and get a zero-copy pyarrow.Table instead — geometry stays as Arrow buffers until you need it:

# Zero-copy Arrow table — no shapely allocation
tbl = client.linz("nz_parcels", as_arrow=True)

# Filter before materialising — dramatically cheaper than loading the full GeoDataFrame
import duckdb
result = duckdb.sql("""
    SELECT parcel_id, geometry_wkt
    FROM tbl
    WHERE ST_Within(ST_GeomFromText(geometry_wkt),
                    ST_GeomFromText('POLYGON((174.7 -41.3, 174.8 -41.3, 174.8 -41.4, 174.7 -41.4, 174.7 -41.3))'))
""").df()

as_arrow=True works on all datasets (geo or non-geo), all routing modes (live, cached, auto), and all source helpers. It cannot be combined with as_geo=True.

Polars output

df = client.get("nz_cpi", engine="polars")
# returns a polars DataFrame

Requires polars: pip install polars.

Exceptions

from eolas_data.exceptions import RateLimitError, AuthenticationError, NotFoundError

try:
    df = client.statsnz("nz_cpi")
except RateLimitError:
    print("Upgrade to Pro for unlimited requests")
except AuthenticationError:
    print("Check your API key")
except NotFoundError:
    print("Series does not exist")

Exception	HTTP status	When raised
`AuthenticationError`	401, 403	Invalid or inactive key
`RateLimitError`	429	Daily limit reached
`NotFoundError`	404	Series identifier not found
`APIError`	other	Unexpected API error

Attribution and provenance

Every /data response carries X-Eolas-Attribution, X-Eolas-Licence, and related headers. The client merges them into df.attrs["eolas_meta"] automatically (v1.3.3+).

df = client.get("rbnz_b1_exchange_rates_monthly", limit=5)
df.attrs["eolas_meta"]["attribution_text"]
df.attrs["eolas_meta"]["licence"]

For provenance in the JSON body (agents, pipelines), pass envelope=True — same as ?envelope=1 on the API:

df = client.get("nz_cpi", limit=5, envelope=True)
df.attrs["eolas_meta"]["data_sources"]  # licence block alongside rows

See Getting started §5 for the raw HTTP shape and Snowflake ATTRIBUTIONS table.

Source

github.com/phildonovan/eolas-data-python · PyPI