Access data from nonsearchable stac catalog
Load a GeoTiff file from a non-seachable STAC catalog¶
A DeepESDL example notebook¶
This notebook demonstrates how to load a GeoTIFF file from a non-searchable STAC catalog via xcube-stac store. The data in the example is fetchedthe EcoDataCube.eu STAC catalog.
A non-searchable catalog does not implement the STAC API - Item Search conformance class. When searching in such type of catalog, the catalog needs to be crawled through and the items properties needs to be matched to the search parameters. This process can be slow, especially for large catalogs.
For more example notebooks using xcube-stac, please head over to GitHub https://github.com/xcube-dev/xcube-stac/tree/main/examples/notebooks. There you can find examples like leading GeoTiffs and netCDFs from searchable STAC-Catalogs.
Please, also refer to the DeepESDL documentation and visit the platform's website for further information!
Brockmann Consult, 2025
This notebook runs with the python environment users-deepesdl-xcube-1.11.1, please checkout the documentation for help on changing the environment.
%%time
from xcube.core.store import new_data_store, get_data_store_params_schema
import itertools
CPU times: user 1.5 s, sys: 519 ms, total: 2.02 s Wall time: 3.94 s
First, we get the store parameters needed to initialize a STAC data store.
get_data_store_params_schema("stac")
<xcube.util.jsonschema.JsonObjectSchema at 0x7f1afd143070>
We determine the url of the EcoDataCube.eu STAC catalog and initiate a STAC data store where the xcube-stac plugin is recognized by setting the first argument to "stac" in the new_data_store function.
%%time
url = "https://s3.eu-central-1.wasabisys.com/stac/odse/catalog.json"
store = new_data_store("stac", url=url)
CPU times: user 29.1 ms, sys: 1.59 ms, total: 30.7 ms Wall time: 262 ms
/home/conda/users/60ba36d6-1759325445-137-deepesdl-xcube-1.11.1/lib/python3.13/site-packages/pystac_client/client.py:191: NoConformsTo: Server does not advertise any conformance classes. warnings.warn(NoConformsTo())
The data IDs point to a STAC item's JSON and are specified by the segment of the URL that follows the catalog's URL. The data IDs can be streamed using the following code where we show the first 10 data IDs as an example.
%%time
data_ids = store.get_data_ids()
list(itertools.islice(data_ids, 10))
CPU times: user 1.63 s, sys: 97.9 ms, total: 1.72 s Wall time: 25.6 s
['accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000101_20000131/accum.precipitation_chelsa.montlhy_20000101_20000131.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000201_20000228/accum.precipitation_chelsa.montlhy_20000201_20000228.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000301_20000331/accum.precipitation_chelsa.montlhy_20000301_20000331.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000401_20000430/accum.precipitation_chelsa.montlhy_20000401_20000430.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000501_20000531/accum.precipitation_chelsa.montlhy_20000501_20000531.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000601_20000630/accum.precipitation_chelsa.montlhy_20000601_20000630.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000701_20000731/accum.precipitation_chelsa.montlhy_20000701_20000731.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000801_20000831/accum.precipitation_chelsa.montlhy_20000801_20000831.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000901_20000930/accum.precipitation_chelsa.montlhy_20000901_20000930.json', 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20001001_20001031/accum.precipitation_chelsa.montlhy_20001001_20001031.json']
In the next step, we can search for items using search parameters. The following code shows which search parameters are available.
%%time
search_params = store.get_search_params_schema()
search_params
CPU times: user 23 μs, sys: 6 μs, total: 29 μs Wall time: 31.7 μs
<xcube.util.jsonschema.JsonObjectSchema at 0x7f1afceab930>
Now, let's search for Topographic Wetness Index generated by the Ensemble Digital Terrain Model by Whitbox Workflow for the European region during the first quarter of 2010.
%%time
descriptors = list(
store.search_data(
collections=["twi_edtm"],
bbox=[-10, 40, 40, 70],
time_range=["2010-01-01", "2010-04-01"],
)
)
[d.to_dict() for d in descriptors]
CPU times: user 32.5 ms, sys: 0 ns, total: 32.5 ms Wall time: 183 ms
[{'data_id': 'twi_edtm/twi_edtm_20060101_20151231/twi_edtm_20060101_20151231.json',
'data_type': 'dataset',
'bbox': [-56.51881139294227,
24.275788389340878,
72.93153043046625,
72.64259665547773],
'time_range': ('2006-01-01', '2015-12-31')}]
In the next step, we can open the data for each data ID. The following code shows which parameters are available for opening the data.
%%time
open_params = store.get_open_data_params_schema()
open_params
CPU times: user 39 μs, sys: 0 ns, total: 39 μs Wall time: 42.2 μs
<xcube.util.jsonschema.JsonObjectSchema at 0x7f1afceabaf0>
Now, we open the data for a given data ID, where we select all available assets, which can be opened by the data store. Note that we assign data_type to get an xarray.Dataset as return value.
%%time
ds = store.open_data(descriptors[0].data_id, data_type="dataset")
ds
CPU times: user 259 ms, sys: 71.6 ms, total: 331 ms Wall time: 975 ms
<xarray.Dataset> Size: 66GB
Dimensions: (x: 216700, y: 153400)
Coordinates:
* x (x) float64 2MB 9e+05 9e+05 9.001e+05 ... 7.401e+06 7.401e+06
* y (y) float64 1MB 5.501e+06 5.501e+06 ... 8.99e+05 8.99e+05
spatial_ref int64 8B 0
Data variables:
band_1 (y, x) int16 66GB dask.array<chunksize=(512, 512), meta=np.ndarray>
Attributes:
stac_catalog_url: https://s3.eu-central-1.wasabisys.com/stac/odse/cata...
stac_item_id: twi_edtm_20060101_20151231
xcube_stac_version: 1.1.1We plot the loaded data as an example below.
%%time
ds.band_1[100000:120000:20, 100000:120000:20].plot(vmin=0, vmax=800)
CPU times: user 4.01 s, sys: 1.14 s, total: 5.16 s Wall time: 16 s
<matplotlib.collections.QuadMesh at 0x7f1ae9960830>
We can also open a GeoTIFF as a xcube's multi-resolution dataset, where we can select the level of resolution, shown below.
%%time
mlds = store.open_data(descriptors[0].data_id, data_type="mldataset")
mlds.num_levels
CPU times: user 37 ms, sys: 3.96 ms, total: 40.9 ms Wall time: 277 ms
8
ds = mlds.get_dataset(5)
ds
<xarray.Dataset> Size: 65MB
Dimensions: (x: 6771, y: 4793)
Coordinates:
* x (x) float64 54kB 9.005e+05 9.014e+05 ... 7.4e+06 7.401e+06
* y (y) float64 38kB 5.501e+06 5.5e+06 ... 9.004e+05 8.995e+05
spatial_ref int64 8B 0
Data variables:
band_1 (y, x) int16 65MB dask.array<chunksize=(512, 512), meta=np.ndarray>
Attributes:
source: https://s3.ecodatacube.eu/arco/twi_edtm_m_30m_s_20000101_202212...%%time
ds.band_1[3125:3750, 3125:3750].plot(vmax=800)
CPU times: user 85 ms, sys: 10.3 ms, total: 95.3 ms Wall time: 377 ms
<matplotlib.collections.QuadMesh at 0x7f1aed716d50>