Processing with HDF5 datasets

Processing with HDF5 datasets#

Unlike the h5py package, which returns numpy.ndarray when accessing the values of datasets, the h5rdmtoolbox returns xarray.DataArray objects. The xarray.DataArray object allows to carry attributes with the numpy-like multi-dimensional array. It also supports the concept of dimensions and coordinates, allowing to assign the array axis with meaning ful (meta) data.

Let’s dive into it and explore the practical implications of retrieving xarray.DatArray:

import h5rdmtoolbox as h5tbx
import numpy as np

h5tbx.use(None)

using("h5py")

Let’s create an example file. Note, that we pass make_scale and attach_scale as arguments to setup the coordinates and their association to the HDF5 dataset “data”. The useful implications will be visible when we access the dataset values in the next steps.

with h5tbx.File() as h5:
    dsx = h5.create_dataset('x', data=np.linspace(0, 10, 5),
                            attrs=dict(units='mm', long_name='x'),
                            make_scale=True)
    dsy = h5.create_dataset('y', data=np.linspace(0, 5, 11),
                            attrs=dict(units='mm', long_name='y'),
                            make_scale=True)
    h5.create_dataset('vel', data=np.random.random((11, 5)),
                      attrs=dict(units='m/s', long_name='velocity'),
                      attach_scales=(dsy, dsx))
    h5.dump()

/(3)

Cirumnavigate return of `xarray.DataArray` objects#

In certain cases, there may be no requirement to return xarray.DataArray objects, and it may be more convenient to work with the default interface, hence numpy.array objects:

If we got the xarray object already, just call the property .values. Otherwise, we have the following two options to retrieve numpy.array:

with h5tbx.File(h5.hdf_filename) as h5:
    data_np = h5.vel.values[:]
type(data_np)

numpy.ndarray

Using the configuration setter just for this code snippet (using context manager syntax):

with h5tbx.set_config(return_xarray=False):
    with h5tbx.File(h5.hdf_filename) as h5:
        data_np = h5.vel.values[:]
type(data_np)

numpy.ndarray

HDF Dataset with ancillary datasets#

Ancillary datasets, which exist in the HDF5 file and are associated to one dataset. The ancillary datasets must have the same shape as the parent dataset.

An common use-case is the association of validation flags or uncertainty data.

Let’s add a relative uncertainty of 5% to the dataset “vel”. For this we create the dataset “uncertainty” and attach it to the already existing dataset “vel”:

rel_uncertainty = np.clip(np.random.normal(loc=0.025, scale=0.001, size=(11, 5)), 0, None)

with h5tbx.File(h5.hdf_filename, mode='r+') as h5:
    h5.create_dataset('uncertainty',
                      data=rel_uncertainty,
                      attach_scales=('y', 'x'),
                     attrs={'units': ''})
    h5.vel.attach_ancillary_dataset(h5.uncertainty)

h5tbx.dump(h5)

/(4)

The ancillary dataset will appear as a xarray coordinate when the dataset is sliced:

with h5tbx.File(h5.hdf_filename) as h5:
    u = h5.vel[()]

with h5tbx.File(h5.hdf_filename) as h5:
    print('available ancillary datasets: ', h5.vel.ancillary_datasets)
    data = h5.vel.sel(y=3.1, method='nearest')
data.coords

available ancillary datasets:  {'uncertainty': <HDF5 dataset "uncertainty": shape (11, 5), type "<f8", convention "h5py">}

Coordinates:
    y            float64 8B 3.0
  * x            (x) float64 40B 0.0 2.5 5.0 7.5 10.0
    uncertainty  (x) float64 40B 0.02528 0.02414 0.02652 0.02551 0.02515

Processing with HDF5 datasets

Contents

Processing with HDF5 datasets#

Array Slicing#

Advantages of retrieving `xarray.DataArray`#

Cirumnavigate return of `xarray.DataArray` objects#

Selecting data (`.sel`)#

HDF Dataset with ancillary datasets#

Unit interface#

Processing with HDF5 datasets

Contents

Processing with HDF5 datasets#

Array Slicing#

Advantages of retrieving xarray.DataArray#

Cirumnavigate return of xarray.DataArray objects#

Selecting data (.sel)#

HDF Dataset with ancillary datasets#

Unit interface#

Advantages of retrieving `xarray.DataArray`#

Cirumnavigate return of `xarray.DataArray` objects#

Selecting data (`.sel`)#