Access data from nonsearchable stac catalog
Load a GeoTiff file from a non-seachable STAC catalog¶
A DeepESDL example notebook¶
This notebook demonstrates how to load a GeoTIFF file from a non-searchable STAC catalog via xcube-stac store. The data in the example is fetchedthe EcoDataCube.eu STAC catalog.
A non-searchable catalog does not implement the STAC API - Item Search conformance class. When searching in such type of catalog, the catalog needs to be crawled through and the items properties needs to be matched to the search parameters. This process can be slow, especially for large catalogs.
For more example notebooks using xcube-stac, please head over to GitHub https://github.com/xcube-dev/xcube-stac/tree/main/examples/notebooks. There you can find examples like leading GeoTiffs and netCDFs from searchable STAC-Catalogs.
Please, also refer to the DeepESDL documentation and visit the platform's website for further information!
Brockmann Consult, 2025
This notebook runs with the python environment users-deepesdl-xcube-1.11.0
, please checkout the documentation for help on changing the environment.
%%time
from xcube.core.store import new_data_store, get_data_store_params_schema
import itertools
CPU times: user 1.55 s, sys: 280 ms, total: 1.83 s Wall time: 2.36 s
First, we get the store parameters needed to initialize a STAC data store.
get_data_store_params_schema("stac")
<xcube.util.jsonschema.JsonObjectSchema at 0x7f246cb23310>
We determine the url of the EcoDataCube.eu STAC catalog and initiate a STAC data store where the xcube-stac
plugin is recognized by setting the first argument to "stac"
in the new_data_store
function.
%%time
url = "https://s3.eu-central-1.wasabisys.com/stac/odse/catalog.json"
store = new_data_store("stac", url=url)
CPU times: user 29.8 ms, sys: 0 ns, total: 29.8 ms Wall time: 201 ms
/home/conda/users/703dc0e1-1755683629-83-deepesdl-xcube-1.11.0/lib/python3.13/site-packages/pystac_client/client.py:191: NoConformsTo: Server does not advertise any conformance classes. warnings.warn(NoConformsTo())
The data IDs point to a STAC item's JSON and are specified by the segment of the URL that follows the catalog's URL. The data IDs can be streamed using the following code where we show the first 10 data IDs as an example.
%%time
data_ids = store.get_data_ids()
list(itertools.islice(data_ids, 10))
CPU times: user 510 ms, sys: 97.8 ms, total: 608 ms Wall time: 14.1 s
['accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000101_20000131/accum.precipitation_chelsa.montlhy_20000101_20000131.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000201_20000228/accum.precipitation_chelsa.montlhy_20000201_20000228.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000301_20000331/accum.precipitation_chelsa.montlhy_20000301_20000331.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000401_20000430/accum.precipitation_chelsa.montlhy_20000401_20000430.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000501_20000531/accum.precipitation_chelsa.montlhy_20000501_20000531.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000601_20000630/accum.precipitation_chelsa.montlhy_20000601_20000630.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000701_20000731/accum.precipitation_chelsa.montlhy_20000701_20000731.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000801_20000831/accum.precipitation_chelsa.montlhy_20000801_20000831.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000901_20000930/accum.precipitation_chelsa.montlhy_20000901_20000930.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20001001_20001031/accum.precipitation_chelsa.montlhy_20001001_20001031.json']
In the next step, we can search for items using search parameters. The following code shows which search parameters are available.
%%time
search_params = store.get_search_params_schema()
search_params
CPU times: user 25 μs, sys: 5 μs, total: 30 μs Wall time: 34.3 μs
<xcube.util.jsonschema.JsonObjectSchema at 0x7f2464e9da90>
Now, let's search for Topographic Wetness Index generated by the Ensemble Digital Terrain Model by Whitbox Workflow for the European region during the first quarter of 2010.
%%time
descriptors = list(
store.search_data(
collections=["twi_edtm"],
bbox=[-10, 40, 40, 70],
time_range=["2010-01-01", "2010-04-01"],
)
)
[d.to_dict() for d in descriptors]
CPU times: user 21.5 ms, sys: 7.83 ms, total: 29.3 ms Wall time: 176 ms
[{'data_id': 'twi_edtm/twi_edtm_20060101_20151231/twi_edtm_20060101_20151231.json', 'data_type': 'dataset', 'bbox': [-56.51881139294227, 24.275788389340878, 72.93153043046625, 72.64259665547773], 'time_range': ('2006-01-01', '2015-12-31')}]
In the next step, we can open the data for each data ID. The following code shows which parameters are available for opening the data.
%%time
open_params = store.get_open_data_params_schema()
open_params
CPU times: user 38 μs, sys: 0 ns, total: 38 μs Wall time: 39.8 μs
<xcube.util.jsonschema.JsonObjectSchema at 0x7f2464e9e430>
Now, we open the data for a given data ID, where we select all available assets, which can be opened by the data store. Note that we assign data_type
to get an xarray.Dataset
as return value.
%%time
ds = store.open_data(descriptors[0].data_id, data_type="dataset")
ds
CPU times: user 298 ms, sys: 52.9 ms, total: 351 ms Wall time: 1.85 s
<xarray.Dataset> Size: 66GB Dimensions: (x: 216700, y: 153400) Coordinates: * x (x) float64 2MB 9e+05 9e+05 9.001e+05 ... 7.401e+06 7.401e+06 * y (y) float64 1MB 5.501e+06 5.501e+06 ... 8.99e+05 8.99e+05 spatial_ref int64 8B 0 Data variables: band_1 (y, x) int16 66GB dask.array<chunksize=(512, 512), meta=np.ndarray> Attributes: stac_catalog_url: https://s3.eu-central-1.wasabisys.com/stac/odse/cata... stac_item_id: twi_edtm_20060101_20151231 xcube_stac_version: 1.0.0
We plot the loaded data as an example below.
%%time
ds.band_1[100000:120000:20, 100000:120000:20].plot(vmin=0, vmax=800)
CPU times: user 4.02 s, sys: 1.15 s, total: 5.17 s Wall time: 17 s
<matplotlib.collections.QuadMesh at 0x7f2451a8a120>
We can also open a GeoTIFF as a xcube's multi-resolution dataset, where we can select the level of resolution, shown below.
%%time
mlds = store.open_data(descriptors[0].data_id, data_type="mldataset")
mlds.num_levels
CPU times: user 39.9 ms, sys: 2.75 ms, total: 42.6 ms Wall time: 274 ms
8
ds = mlds.get_dataset(5)
ds
<xarray.Dataset> Size: 65MB Dimensions: (x: 6771, y: 4793) Coordinates: * x (x) float64 54kB 9.005e+05 9.014e+05 ... 7.4e+06 7.401e+06 * y (y) float64 38kB 5.501e+06 5.5e+06 ... 9.004e+05 8.995e+05 spatial_ref int64 8B 0 Data variables: band_1 (y, x) int16 65MB dask.array<chunksize=(512, 512), meta=np.ndarray> Attributes: source: https://s3.ecodatacube.eu/arco/twi_edtm_m_30m_s_20000101_202212...
%%time
ds.band_1[3125:3750, 3125:3750].plot(vmax=800)
CPU times: user 80 ms, sys: 8.34 ms, total: 88.4 ms Wall time: 342 ms
<matplotlib.collections.QuadMesh at 0x7f2443502350>