Creating a convention#

A convention can also be defined in a YAML (or JSON) file. It consists of three parts:

1. General information (keywords must start and end with double underscores)
2. Definition of *standard attributes*
3. Definition of *standard attribute validators*
  1. General information at the header indicated by double underscores:

  1. Definition of standard attributes

A standard_attribute has various properties like the description, target_method (during which the standard attribute can be passed), the validator, which validates the input and the default_value. The latter can be “\$EMPTY” indicating that no default value is set, and thus this attribute is obligatory. “\$NONE” indicates, that the attribute is optional. And finally, a valid value can be given to be written even if no input is provided for this standard attribute.

  1. Special type definitions

Here the allowed values for the standard attribute data_type is listed:

The heart of standard attributes is the validator. A validator becomes effective when metadata is written. The Flow chart below illustrates the writing and reading procedure for the example of writing the attribute “units”.

  1. Writing (ds.units = 'm/s'): If “units” is defined in the convention, the validator checks the value. “m/s” is a correct unit, so it will be written. Otherwise, invalid values raise an error.

  2. Reading (print(ds.units)): The validator becomes effective upon reading attributes, that are standardized. However, then invalid value only raise warnings, in order to allow the user to still work with the file and fix the issue.

h5RDMtoolbox_standard_attribute_concept.png

Reading a Convention from a file:#

Let’s read the above example file into the class Convention. The object representation displays the standard attributes which are expected for the root group (File.__init__()), group creation (Group.create_group()) and dataset creation (Group.create_dataset()).

Note, that the standard attributes, which are marked bold, are obligatory. The others may or may not be provided during object creation:

from h5rdmtoolbox import convention
import h5rdmtoolbox as h5tbx
h5tbx.convention.utils.yaml2json('example_convention.yaml')
PosixPath('example_convention.json')
cv = h5tbx.convention.from_yaml('example_convention.yaml')
# you may also use .from_json('example_convention.json')
cv
Convention("h5rdmtoolbox-tuturial-convention")

In order to make the convention affective in this session, it must be enabled. We do this by calling use():

h5tbx.use(cv)
using("h5rdmtoolbox-tuturial-convention")

Now, we will get an error if we create a HDF5 file without providing the attribute contact_id. As we made it a required attribute, it must be provided during file initialization:

try:
    with h5tbx.File() as h5:
        pass
except h5tbx.errors.StandardAttributeError as e:
    print(e)
Convention "h5rdmtoolbox-tuturial-convention" expects standard attribute "contact_id" to be provided as an argument during file creation.

Providing a wrong value raises an error, too:

try:
    with h5tbx.File(contact_id='id1722') as h5:
        h5.create_dataset(name='velocity', shape=(3, 4), units='m/s', comment='velocity field')
except h5tbx.errors.StandardAttributeError as e:
    print(e)
Validation of "velocity field" for standard attribute "comment" failed.
Expected fields: {'comment': FieldInfo(annotation=str, required=True, metadata=[WrapValidator(func=<function regex_0 at 0x7101943ebc70>, json_schema_input_type=PydanticUndefined)])}
Pydantic error: 1 validation error for comment
comment
  Value error, Invalid format for pattern [type=value_error, input_value='velocity field', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error

Now, we got it:

with h5tbx.File(contact_id='id1722') as h5:
    h5.dump()
    • contact_id: id1722

Note, that if we were to reopen the file not in read-only (r) but in read-write mode, then the standard attributes which already exist are not checked again. So if the HDF5 was written with another package, e.g. h5py, then the value might be wrong:

with h5tbx.File(name=h5.hdf_filename, mode='r+') as h5:
    pass # note, that we were not required to pass "data_type" as it was present already!

Note, that a convention can also be enabled only temporarily using the context manager syntax:

with h5tbx.use(cv):
    with h5tbx.File(contact_id='id1722') as h5:
        pass

Importing/Loading an online convention#

The intended distribution of convention is via online repositories. The YAML file hence should be uploaded such it is accessible to all users. The h5RDMtoolbox currently favors the usage of Zenodo repositories. The advantages are long-term storage and assignment of a DOI. However, files accessible via an URL can also be downloaded.

A tutorial convention is published here. By calling from_zenodo() the convention object is created:

cv = h5tbx.convention.from_zenodo(doi_or_recid='10428822')
cv
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 cv = h5tbx.convention.from_zenodo(doi_or_recid='10428822')
      2 cv

File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/envs/v2.6.2/lib/python3.10/site-packages/h5rdmtoolbox/convention/core.py:776, in from_zenodo(doi_or_recid, name, overwrite, force_download)
    774     found_filenames.extend(vfuns_matches)
    775     for match in found_filenames:
--> 776         _filename = record.download_file(match, target_folder=pathlib.Path(match).parent)
    777         shutil.move(_filename, match)
    779 return from_yaml(yaml_matches[0], overwrite=overwrite)

File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/envs/v2.6.2/lib/python3.10/site-packages/h5rdmtoolbox/repository/zenodo/core.py:342, in ZenodoRecord.download_file(self, filename, target_folder)
    325 """Download a file based on URL. The url is validated using pydantic
    326 
    327 Parameters
   (...)
    339     The path to the downloaded file.
    340 """
    341 warnings.warn("Please use `.files.get(filename).download()`", DeprecationWarning)
--> 342 return self.files.get(filename).download(target_folder=target_folder)

File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/envs/v2.6.2/lib/python3.10/site-packages/h5rdmtoolbox/repository/interface.py:111, in RepositoryFile.download(self, target_folder)
    108 """Download the file to target_folder. If None, local user dir is used.
    109 Returns the file location"""
    110 from .utils import download_file
--> 111 return download_file(
    112     file_url=self.download_url,
    113     filename=self.name,
    114     target_folder=target_folder,
    115     access_token=self.access_token,
    116     checksum=self.checksum,
    117     checksum_algorithm=self.checksum_algorithm,
    118 )

File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/envs/v2.6.2/lib/python3.10/site-packages/h5rdmtoolbox/repository/utils.py:63, in download_file(file_url, filename, target_folder, access_token, checksum, checksum_algorithm, headers)
     61 except requests.RequestException as e:
     62     response = requests.get(file_url, stream=True, params={'access_token': access_token}, headers=headers)
---> 63 response.raise_for_status()
     65 chunk_size = 1024  # Define chunk size for download
     67 with open(target_filename, 'wb') as f:

File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/envs/v2.6.2/lib/python3.10/site-packages/requests/models.py:1026, in Response.raise_for_status(self)
   1021     http_error_msg = (
   1022         f"{self.status_code} Server Error: {reason} for url: {self.url}"
   1023     )
   1025 if http_error_msg:
-> 1026     raise HTTPError(http_error_msg, response=self)

HTTPError: 403 Client Error: Forbidden for url: https://zenodo.org/records/10428822/files/tutorial_convention.yaml

Effect of enabling a convention#

The convention above defined the usage of certain attributes with certain methods. E.g. “data_type” is to be used when a HDF5 file is created. When the convention is enabled, the signature of the respective methods is changed. To proof this, let’s implement a small function, which prints all parameters of a given function and inspect the effect of the convention in the __init__ method:

cv.properties[h5tbx.Dataset]['standard_name']
<StandardAttribute@create_dataset[keyword/optional]("standard_name"): default_value="None" | "Standard name of the dataset. If not set, the long_name attribute must be given.">
import inspect

def print_method_parameters(method):
    print(f'\nParameters for "{method.__name__}":')
    for param in inspect.signature(method).parameters.values():
        if not param.name == 'self':
            if param.name in h5tbx.convention.get_current_convention().methods[h5tbx.File].get('__init__', {}).keys():
                print(f'  - {h5tbx._repr.make_bold(param.name)}')
            else:
                print(f'  - {param.name}')

methods = (h5tbx.File.__init__, h5tbx.Group.create_group, h5tbx.Group.create_dataset)

print('no convention: ')
h5tbx.use(None)
print_method_parameters(h5tbx.File.__init__)

print(f'\n------------\nwith convention {cv.name}: (standard attributes are made bold)')
h5tbx.use(cv)
print_method_parameters(h5tbx.File.__init__)
no convention: 

Parameters for "__init__":
  - name
  - mode
  - attrs
  - kwargs

------------
with convention h5rdmtoolbox-tutorial-convention: (standard attributes are made bold)

Parameters for "__init__":
  - name
  - mode
  - attrs
  - data_type
  - standard_name_table
  - comment
  - contact
  - references
  - kwargs
h5tbx.use(None)  # fall back to the default convention
using("h5py")