Creating a convention#

A convention can also be defined in a YAML (or JSON) file. It consists of three parts:

1. General information (keywords must start and end with double underscores)
2. Definition of *standard attributes*
3. Definition of *standard attribute validators*
  1. General information at the header indicated by double underscores:

  1. Definition of standard attributes

A standard_attribute has various properties like the description, target_method (during which the standard attribute can be passed), the validator, which validates the input and the default_value. The latter can be “\$EMPTY” indicating that no default value is set, and thus this attribute is obligatory. “\$NONE” indicates, that the attribute is optional. And finally, a valid value can be given to be written even if no input is provided for this standard attribute.

  1. Special type definitions

Here the allowed values for the standard attribute data_type is listed:

The heart of standard attributes is the validator. A validator becomes effective when metadata is written. The Flow chart below illustrates the writing and reading procedure for the example of writing the attribute “units”.

  1. Writing (ds.units = 'm/s'): If “units” is defined in the convention, the validator checks the value. “m/s” is a correct unit, so it will be written. Otherwise, invalid values raise an error.

  2. Reading (print(ds.units)): The validator becomes effective upon reading attributes, that are standardized. However, then invalid value only raise warnings, in order to allow the user to still work with the file and fix the issue.

h5RDMtoolbox_standard_attribute_concept.png

Reading a Convention from a file:#

Let’s read the above example file into the class Convention. The object representation displays the standard attributes which are expected for the root group (File.__init__()), group creation (Group.create_group()) and dataset creation (Group.create_dataset()).

Note, that the standard attributes, which are marked bold, are obligatory. The others may or may not be provided during object creation:

from h5rdmtoolbox import convention
import h5rdmtoolbox as h5tbx
h5tbx.convention.utils.yaml2json('example_convention.yaml')
PosixPath('example_convention.json')
cv = h5tbx.convention.from_yaml('example_convention.yaml')
# you may also use .from_json('example_convention.json')
cv
Convention("h5rdmtoolbox-tuturial-convention")

In order to make the convention affective in this session, it must be enabled. We do this by calling use():

h5tbx.use(cv)
using("h5rdmtoolbox-tuturial-convention")

Now, we will get an error if we create a HDF5 file without providing the attribute contact_id. As we made it a required attribute, it must be provided during file initialization:

try:
    with h5tbx.File() as h5:
        pass
except h5tbx.errors.StandardAttributeError as e:
    print(e)
Convention "h5rdmtoolbox-tuturial-convention" expects standard attribute "contact_id" to be provided as an argument during file creation.

Providing a wrong value raises an error, too:

try:
    with h5tbx.File(contact_id='id1722') as h5:
        h5.create_dataset(name='velocity', shape=(3, 4), units='m/s', comment='velocity field')
except h5tbx.errors.StandardAttributeError as e:
    print(e)
Validation of "velocity field" for standard attribute "comment" failed.
Expected fields: {'comment': FieldInfo(annotation=str, required=True, metadata=[WrapValidator(func=<function regex_0 at 0x750e23171480>, json_schema_input_type=PydanticUndefined)])}
Pydantic error: 1 validation error for comment
comment
  Value error, Invalid format for pattern [type=value_error, input_value='velocity field', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error

Now, we got it:

with h5tbx.File(contact_id='id1722') as h5:
    h5.dump()
    • contact_id: id1722

Note, that if we were to reopen the file not in read-only (r) but in read-write mode, then the standard attributes which already exist are not checked again. So if the HDF5 was written with another package, e.g. h5py, then the value might be wrong:

with h5tbx.File(name=h5.hdf_filename, mode='r+') as h5:
    pass # note, that we were not required to pass "data_type" as it was present already!

Note, that a convention can also be enabled only temporarily using the context manager syntax:

with h5tbx.use(cv):
    with h5tbx.File(contact_id='id1722') as h5:
        pass

Importing/Loading an online convention#

The intended distribution of convention is via online repositories. The YAML file hence should be uploaded such it is accessible to all users. The h5RDMtoolbox currently favors the usage of Zenodo repositories. The advantages are long-term storage and assignment of a DOI. However, files accessible via an URL can also be downloaded.

A tutorial convention is published here. By calling from_zenodo() the convention object is created:

cv = h5tbx.convention.from_zenodo(doi_or_recid='10428822')
cv
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 cv = h5tbx.convention.from_zenodo(doi_or_recid='10428822')
      2 cv

File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/envs/v2.7.0/lib/python3.10/site-packages/h5rdmtoolbox/convention/core.py:779, in from_zenodo(doi_or_recid, name, overwrite, force_download)
    776         _filename = record.download_file(match, target_folder=pathlib.Path(match).parent)
    777         shutil.move(_filename, match)
--> 779 return from_yaml(yaml_matches[0], overwrite=overwrite)

File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/envs/v2.7.0/lib/python3.10/site-packages/h5rdmtoolbox/convention/core.py:685, in from_yaml(filename, overwrite)
    683 """Load a convention from a YAML file. See Convention.from_yaml() for details"""
    684 logger.debug(f"Reading Convention from yaml file: {filename}")
--> 685 return Convention.from_yaml(filename, overwrite=overwrite)

File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/envs/v2.7.0/lib/python3.10/site-packages/h5rdmtoolbox/convention/core.py:347, in Convention.from_yaml(cls, yaml_filename, overwrite)
    345 if convention_name in [d.name for d in CV_DIR.glob('*')]:
    346     if not overwrite:
--> 347         return _get_convention_from_dir(attrs['__name__'])
    348     # overwriting existing convention
    349     delete(convention_name)

File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/envs/v2.7.0/lib/python3.10/site-packages/h5rdmtoolbox/convention/core.py:529, in _get_convention_from_dir(convention_name)
    527 sys.path.insert(0, str(_convention_py_filename.parent))
    528 # import:
--> 529 _import_convention(_convention_name)
    530 # now it is registered and can be returned:
    531 cv = get_registered_conventions()[convention_name]

File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/envs/v2.7.0/lib/python3.10/site-packages/h5rdmtoolbox/convention/core.py:513, in _import_convention(convention_name)
    511 import importlib
    512 try:
--> 513     return importlib.import_module(f'{convention_name}')
    514 except ImportError:
    515     logger.error(f"Failed to import module {convention_name}. Most likely the created convention file is erroneous.")

File ~/.asdf/installs/python/3.10.17/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
    124             break
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1006, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:688, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:883, in exec_module(self, module)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

File ~/.local/share/h5rdmtoolbox/2.7.0/convention/h5rdmtoolbox_tutorial_convention/h5rdmtoolbox_tutorial_convention.py:68
     64     """Type of data in file. Can be numerical, analytical or experimental."""
     65     data_type: datatypes
---> 68 class standard_name(BaseModel):
     69     """Standard name of the dataset. If not set, the long_name attribute must be given."""
     70     standard_name: standard_name

File ~/.local/share/h5rdmtoolbox/2.7.0/convention/h5rdmtoolbox_tutorial_convention/h5rdmtoolbox_tutorial_convention.py:70, in standard_name()
     68 class standard_name(BaseModel):
     69     """Standard name of the dataset. If not set, the long_name attribute must be given."""
---> 70     standard_name: standard_name

NameError: name 'standard_name' is not defined

Effect of enabling a convention#

The convention above defined the usage of certain attributes with certain methods. E.g. “data_type” is to be used when a HDF5 file is created. When the convention is enabled, the signature of the respective methods is changed. To proof this, let’s implement a small function, which prints all parameters of a given function and inspect the effect of the convention in the __init__ method:

cv.properties[h5tbx.Dataset]['standard_name']
<StandardAttribute@create_dataset[keyword/optional]("standard_name"): default_value="None" | "Standard name of the dataset. If not set, the long_name attribute must be given.">
import inspect

def print_method_parameters(method):
    print(f'\nParameters for "{method.__name__}":')
    for param in inspect.signature(method).parameters.values():
        if not param.name == 'self':
            if param.name in h5tbx.convention.get_current_convention().methods[h5tbx.File].get('__init__', {}).keys():
                print(f'  - {h5tbx._repr.make_bold(param.name)}')
            else:
                print(f'  - {param.name}')

methods = (h5tbx.File.__init__, h5tbx.Group.create_group, h5tbx.Group.create_dataset)

print('no convention: ')
h5tbx.use(None)
print_method_parameters(h5tbx.File.__init__)

print(f'\n------------\nwith convention {cv.name}: (standard attributes are made bold)')
h5tbx.use(cv)
print_method_parameters(h5tbx.File.__init__)
no convention: 

Parameters for "__init__":
  - name
  - mode
  - attrs
  - kwargs

------------
with convention h5rdmtoolbox-tutorial-convention: (standard attributes are made bold)

Parameters for "__init__":
  - name
  - mode
  - attrs
  - data_type
  - standard_name_table
  - comment
  - contact
  - references
  - kwargs
h5tbx.use(None)  # fall back to the default convention
using("h5py")