Access data from nonsearchable stac catalog

Load a GeoTiff file from a non-seachable STAC catalog¶

A DeepESDL example notebook¶

This notebook demonstrates how to load a GeoTIFF file from a non-searchable STAC catalog via xcube-stac store. The data in the example is fetchedthe EcoDataCube.eu STAC catalog.

A non-searchable catalog does not implement the STAC API - Item Search conformance class. When searching in such type of catalog, the catalog needs to be crawled through and the items properties needs to be matched to the search parameters. This process can be slow, especially for large catalogs.

For more example notebooks using xcube-stac, please head over to GitHub https://github.com/xcube-dev/xcube-stac/tree/main/examples/notebooks. There you can find examples like leading GeoTiffs and netCDFs from searchable STAC-Catalogs.

Please, also refer to the DeepESDL documentation and visit the platform's website for further information!

Brockmann Consult, 2025

This notebook runs with the python environment users-deepesdl-xcube-1.11.0, please checkout the documentation for help on changing the environment.

In [1]:

Copied!

%%time
from xcube.core.store import new_data_store, get_data_store_params_schema
import itertools
%%time
from xcube.core.store import new_data_store, get_data_store_params_schema
import itertools

CPU times: user 1.91 s, sys: 325 ms, total: 2.23 s
Wall time: 2.65 s

First, we get the store parameters needed to initialize a STAC data store.

In [2]:

Copied!

get_data_store_params_schema("stac")
get_data_store_params_schema("stac")

Out[2]:

<xcube.util.jsonschema.JsonObjectSchema at 0x7f9535c0c3e0>

We determine the url of the EcoDataCube.eu STAC catalog and initiate a STAC data store where the xcube-stac plugin is recognized by setting the first argument to "stac" in the new_data_store function.

In [3]:

Copied!

%%time
url = "https://s3.eu-central-1.wasabisys.com/stac/odse/catalog.json"
store = new_data_store("stac", url=url)
%%time
url = "https://s3.eu-central-1.wasabisys.com/stac/odse/catalog.json"
store = new_data_store("stac", url=url)

CPU times: user 14.2 ms, sys: 8.12 ms, total: 22.4 ms
Wall time: 183 ms

/home/conda/users/19e33cc7-1749710901-48-deepesdl-xcube-1.11.0/lib/python3.12/site-packages/pystac_client/client.py:186: NoConformsTo: Server does not advertise any conformance classes.
  warnings.warn(NoConformsTo())

The data IDs point to a STAC item's JSON and are specified by the segment of the URL that follows the catalog's URL. The data IDs can be streamed using the following code where we show the first 10 data IDs as an example.

In [4]:

Copied!

%%time
data_ids = store.get_data_ids()
list(itertools.islice(data_ids, 10))
%%time
data_ids = store.get_data_ids()
list(itertools.islice(data_ids, 10))

CPU times: user 566 ms, sys: 84.3 ms, total: 650 ms
Wall time: 17.5 s

Out[4]:

['accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000101_20000131/accum.precipitation_chelsa.montlhy_20000101_20000131.json',
 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000201_20000228/accum.precipitation_chelsa.montlhy_20000201_20000228.json',
 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000301_20000331/accum.precipitation_chelsa.montlhy_20000301_20000331.json',
 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000401_20000430/accum.precipitation_chelsa.montlhy_20000401_20000430.json',
 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000501_20000531/accum.precipitation_chelsa.montlhy_20000501_20000531.json',
 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000601_20000630/accum.precipitation_chelsa.montlhy_20000601_20000630.json',
 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000701_20000731/accum.precipitation_chelsa.montlhy_20000701_20000731.json',
 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000801_20000831/accum.precipitation_chelsa.montlhy_20000801_20000831.json',
 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20000901_20000930/accum.precipitation_chelsa.montlhy_20000901_20000930.json',
 'accum.precipitation_chelsa.montlhy/accum.precipitation_chelsa.montlhy_20001001_20001031/accum.precipitation_chelsa.montlhy_20001001_20001031.json']

In the next step, we can search for items using search parameters. The following code shows which search parameters are available.

In [5]:

Copied!

%%time
search_params = store.get_search_params_schema()
search_params
%%time
search_params = store.get_search_params_schema()
search_params

CPU times: user 34 μs, sys: 0 ns, total: 34 μs
Wall time: 36.5 μs

Out[5]:

<xcube.util.jsonschema.JsonObjectSchema at 0x7f952a5e9460>

Now, let's search for Topographic Wetness Index generated by the Ensemble Digital Terrain Model by Whitbox Workflow for the European region during the first quarter of 2010.

In [6]:

Copied!





%%time
descriptors = list(
    store.search_data(
        collections=["twi_edtm"],
        bbox=[-10, 40, 40, 70],
        time_range=["2010-01-01", "2010-04-01"],
    )
)
[d.to_dict() for d in descriptors]
%%time
descriptors = list(
    store.search_data(
        collections=["twi_edtm"],
        bbox=[-10, 40, 40, 70],
        time_range=["2010-01-01", "2010-04-01"],
    )
)
[d.to_dict() for d in descriptors]

CPU times: user 17.6 ms, sys: 3.18 ms, total: 20.7 ms
Wall time: 249 ms

Out[6]:

[{'data_id': 'twi_edtm/twi_edtm_20060101_20151231/twi_edtm_20060101_20151231.json',
  'data_type': 'dataset',
  'bbox': [-56.51881139294227,
   24.275788389340878,
   72.93153043046625,
   72.64259665547773],
  'time_range': ('2006-01-01', '2015-12-31')}]

In the next step, we can open the data for each data ID. The following code shows which parameters are available for opening the data.

In [7]:

Copied!

%%time
open_params = store.get_open_data_params_schema()
open_params
%%time
open_params = store.get_open_data_params_schema()
open_params

CPU times: user 30 μs, sys: 6 μs, total: 36 μs
Wall time: 38.1 μs

Out[7]:

<xcube.util.jsonschema.JsonObjectSchema at 0x7f952a5e9d60>

Now, we open the data for a given data ID, where we select all available assets, which can be opened by the data store. Note that we assign data_type to get an xarray.Dataset as return value.

In [10]:

Copied!

%%time
ds = store.open_data(descriptors[0].data_id, data_type="dataset")
ds
%%time
ds = store.open_data(descriptors[0].data_id, data_type="dataset")
ds

CPU times: user 382 ms, sys: 48 ms, total: 430 ms
Wall time: 1.05 s

Out[10]:

<xarray.Dataset> Size: 66GB
Dimensions:           (x: 216700, y: 153400)
Coordinates:
  * x                 (x) float64 2MB 9e+05 9e+05 ... 7.401e+06 7.401e+06
  * y                 (y) float64 1MB 5.501e+06 5.501e+06 ... 8.99e+05 8.99e+05
    spatial_ref       int64 8B 0
Data variables:
    twi_edtm_m_30m_s  (y, x) int16 66GB dask.array<chunksize=(512, 512), meta=np.ndarray>
Attributes:
    stac_catalog_url:  https://s3.eu-central-1.wasabisys.com/stac/odse/catalo...
    stac_item_id:      twi_edtm_20060101_20151231

We plot the loaded data as an example below.

In [11]:

Copied!

%%time
ds.twi_edtm_m_30m_s[100000:120000:20, 100000:120000:20].plot(vmin=0, vmax=800)
%%time
ds.twi_edtm_m_30m_s[100000:120000:20, 100000:120000:20].plot(vmin=0, vmax=800)

CPU times: user 3.88 s, sys: 1.05 s, total: 4.93 s
Wall time: 18.3 s

Out[11]:

<matplotlib.collections.QuadMesh at 0x7f9518403f50>

No description has been provided for this image

We can also open a GeoTIFF as a xcube's multi-resolution dataset, where we can select the level of resolution, shown below.

In [12]:

Copied!

%%time
mlds = store.open_data(descriptors[0].data_id, data_type="mldataset")
mlds.num_levels
%%time
mlds = store.open_data(descriptors[0].data_id, data_type="mldataset")
mlds.num_levels

CPU times: user 18.1 ms, sys: 1.13 ms, total: 19.2 ms
Wall time: 490 ms

Out[12]:

In [13]:

Copied!

ds = mlds.get_dataset(5)
ds
ds = mlds.get_dataset(5)
ds

Out[13]:

<xarray.Dataset> Size: 65MB
Dimensions:           (x: 6771, y: 4793)
Coordinates:
  * x                 (x) float64 54kB 9.005e+05 9.014e+05 ... 7.4e+06 7.401e+06
  * y                 (y) float64 38kB 5.501e+06 5.5e+06 ... 9.004e+05 8.995e+05
    spatial_ref       int64 8B 0
Data variables:
    twi_edtm_m_30m_s  (y, x) int16 65MB dask.array<chunksize=(512, 512), meta=np.ndarray>
Attributes:
    stac_catalog_url:  https://s3.eu-central-1.wasabisys.com/stac/odse/catalo...
    stac_item_id:      twi_edtm_20060101_20151231

In [14]:

Copied!

%%time
ds.twi_edtm_m_30m_s[3125:3750, 3125:3750].plot(vmax=800)
%%time
ds.twi_edtm_m_30m_s[3125:3750, 3125:3750].plot(vmax=800)

CPU times: user 84.7 ms, sys: 7.57 ms, total: 92.3 ms
Wall time: 394 ms

Out[14]:

<matplotlib.collections.QuadMesh at 0x7f951af2d970>

In [ ]: