Working with a Convention

Working with a Convention#

This section will take you through the steps of using, hence you are very likely a pure user in the project as defined before.

We’ll be using the built-in convention “h5tbx”:

from h5rdmtoolbox import convention
import h5rdmtoolbox as h5tbx

h5tbx.use('h5tbx')
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 from h5rdmtoolbox import convention
      2 import h5rdmtoolbox as h5tbx
      4 h5tbx.use('h5tbx')

File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/checkouts/v1.7.0/h5rdmtoolbox/__init__.py:129
    125     with File(src) as h5:
    126         return h5.dumps()
--> 129 from h5rdmtoolbox.wrapper.ld.hdf.file import get_ld as hdf_get_ld
    130 from h5rdmtoolbox.wrapper.ld.user.file import get_ld as user_get_ld
    133 def get_ld(
    134         hdf_filename: Union[str, pathlib.Path],
    135         structural: bool = True,
    136         semantic: bool = True,
    137         blank_node_iri_base: Optional[str] = None,
    138         **kwargs) -> rdflib.Graph:

File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/checkouts/v1.7.0/h5rdmtoolbox/wrapper/ld/__init__.py:1
----> 1 import ssnolib.ssno.standard_name
      2 from ontolutils.namespacelib import M4I
      3 from ontolutils.namespacelib import SCHEMA

ModuleNotFoundError: No module named 'ssnolib'

Without checking which attributes are standardized, let’s try to create a dataset in a new file:

with h5tbx.File() as h5:
    try:
        h5.create_dataset('ds', shape=(3, ))
    except convention.errors.StandardAttributeError as e:
        print(e)
Convention "h5tbx" expects standard attribute "units" to be provided as an argument during dataset creation.

The error message tells us, that (at least) “units” is missing. So it is time to check which other standard attributes are defined in the convention. The easiest is to print the convention object.

The string representation of the Convention objects tells us which attributes are expected for which method. In this example, the following is defined:

  • For the root group, indicated by File.__init__(): “creation_mode”

  • For any dataset, indicated by Group.create_dataset(): “units” and “symbol”. Note that units is obligatory and “symbol” is optional

This means, that we can provide “creation_mode” during file creation and we must provide the attribute “units” during dataset creation. We will test this in the next section.

cv = convention.get_current_convention()
print(cv)
Convention("h5tbx")
contact: https://orcid.org/0000-0001-8729-0482
  File.__init__():
    * creation_mode:
		Creation mode of the data. Specific for engineering.
  Group.create_dataset():
    * units (obligatory) :
		The physical unit of the dataset. If dimensionless, the unit is ''.
    * symbol:
		The mathematical symbol of the dataset.

But there is more to it: Standard attributes validate the values! Let’s now pass the attribute “units” but purposely set an invalid value. Normally, we can set any value to attributes. But standard attributes can have a validator implemented. This is the case for the attribute “units”. We will find out later how this is done.

with h5tbx.File() as h5:
    try:
        h5.create_dataset('ds', shape=(3, ), units='invalid')
    except convention.errors.StandardAttributeError as e:
        print(e)
Validation of "invalid" for standard attribute "units" failed.
Pydantic error: 1 validation error for units
value
  Parameter error, Units cannot be understood using ureg package: invalid. Original error: 'invalid' is not defined in the unit registry [type=value_error, input_value='invalid', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/value_error

The error tells us, that “invalid” is not understood by the validator.

Finally, we provide a valid SI unit:

with h5tbx.File() as h5:
    h5.create_dataset('ds', shape=(3, ), units='m/s')

We can also access all registered standard attributes of the convention like so:

from pprint import pprint
pprint(cv.registered_standard_attributes)
{'creation_mode': <StandardAttribute@__init__[keyword/optional]("creation_mode"): default_value="_SpecialDefaults.NONE" | "Creation mode of the data. Specific for engineering.">,
 'symbol': <StandardAttribute@create_dataset[keyword/optional]("symbol"): default_value="_SpecialDefaults.NONE" | "The mathematical symbol of the dataset.">,
 'units': <StandardAttribute@create_dataset[positional/obligatory]("units"): "The physical unit of the dataset. If dimensionless, the unit is ''.">}

To find out how to create a new convention, please go to the next section.