{ "cells": [ { "cell_type": "markdown", "id": "29624350-5184-46eb-82b7-f99e36fd6493", "metadata": {}, "source": [ "# Creating a convention\n", "\n", "A convention can also be defined in a YAML (or JSON) file. It consists of three parts:\n", "\n", " 1. General information (keywords must start and end with double underscores)\n", " 2. Definition of *standard attributes*\n", " 3. Definition of *standard attribute validators*" ] }, { "cell_type": "raw", "id": "ba403238-01af-4053-bca0-644e2bbeb4c2", "metadata": {}, "source": [ "# filename: example_convention.yaml\n", "# 1. General information at the header\n", "#-------------------------------------\n", "__name__: h5rdmtoolbox-tuturial-convention\n", "__institution__: https://orcid.org/members/001G000001e5aUTIAY\n", "__contact__: https://orcid.org/0000-0001-8729-0482\n", "\n", "\n", "# 2. Definition of standard attributes\n", "#-------------------------------------\n", "units:\n", " target_method: create_dataset\n", " validator: $units\n", " description: The physical unit of the dataset. If dimensionless, the unit is ''.\n", " default_value: $EMPTY\n", "\n", "contact_id:\n", " description: ID of a person to contact for questions.\n", " target_method: __init__\n", " validator: $str\n", " default_value: $EMPTY\n", "\n", "comment:\n", " description: Comment to a dataset\n", " target_method: create_dataset\n", " validator: $regex(^[A-Z].*)\n", " default_value: $NONE\n", " \n", "data_type:\n", " description: 'Type of data in file. Can be numerical or experimental'\n", " target_method: __init__\n", " validator: $dataSourceTypes\n", "\n", "# 3. Special type definitions\n", "#----------------------------\n", "$dataSourceTypes:\n", " - numerical\n", " - experimental" ] }, { "cell_type": "markdown", "id": "3f3b3ebd-9731-4d25-9508-5ada9606e5ed", "metadata": {}, "source": [ "1. General information at the header indicated by double underscores:" ] }, { "cell_type": "raw", "id": "8c431607-700a-4530-8d80-190713d5a8fe", "metadata": {}, "source": [ "__name__: h5rdmtoolbox-tuturial-convention\n", "__institution__: https://orcid.org/members/001G000001e5aUTIAY\n", "__contact__: https://orcid.org/0000-0001-8729-0482" ] }, { "cell_type": "markdown", "id": "5949999e-743e-4001-9c04-6d9113e81c39", "metadata": {}, "source": [ "2. Definition of standard attributes\n", "\n", "A `standard_attribute` has various properties like the `description`, `target_method` (during which the standard attribute can be passed), the `validator`, which validates the input and the `default_value`. The latter can be \"\\\\$EMPTY\" indicating that no default value is set, and thus this attribute is obligatory. \"\\\\$NONE\" indicates, that the attribute is optional. And finally, a valid value can be given to be written even if no input is provided for this standard attribute." ] }, { "cell_type": "raw", "id": "b6b4a930-b80a-4fee-b73a-743fa79df498", "metadata": {}, "source": [ "contact_id:\n", " description: ID of a person to contact for questions.\n", " target_method: __init__\n", " validator: $str\n", " default_value: $EMPTY\n", "\n", "comment:\n", " description: Comment to a dataset\n", " target_method: create_dataset\n", " validator: $regex(^[A-Z].*)\n", " default_value: $NONE\n", " \n", "data_type:\n", " description: 'Type of data in file. Can be numerical or experimental'\n", " target_method: __init__\n", " validator: $dataSourceTypes" ] }, { "cell_type": "markdown", "id": "b58143cc-8010-4734-828f-b5c42bced82c", "metadata": {}, "source": [ "3. Special type definitions\n", "\n", "Here the allowed values for the standard attribute `data_type` is listed:" ] }, { "cell_type": "raw", "id": "736bf707-78f2-4568-bcce-fede4484f2c5", "metadata": {}, "source": [ "$dataSourceTypes:\n", " - numerical\n", " - experimental" ] }, { "cell_type": "markdown", "id": "ab82d88e-d608-44b4-8fd2-6739c16b2162", "metadata": {}, "source": [ "The heart of standard attributes is the **validator**. A validator becomes effective when metadata is written.\n", "The Flow chart below illustrates the writing and reading procedure for the example of writing the attribute \"units\".\n", "\n", "1. Writing (`ds.units = 'm/s'`): If \"units\" is defined in the convention, the validator checks the value. \"m/s\" is a correct unit, so it will be written. Otherwise, invalid values raise an error.\n", "3. Reading (`print(ds.units)`): The validator becomes effective upon reading attributes, that are standardized. However, then invalid value only raise warnings, in order to allow the user to still work with the file and fix the issue.\n", "\n", "\"h5RDMtoolbox_standard_attribute_concept.png\"" ] }, { "cell_type": "markdown", "id": "45780d72-96bc-44ff-b835-39c9f8ab3637", "metadata": {}, "source": [ "## Reading a Convention from a file:" ] }, { "cell_type": "markdown", "id": "1a619cd5-f8b3-4b55-a032-6f4da66db529", "metadata": {}, "source": [ "Let's read the above example file into the class `Convention`. The object representation displays the standard attributes which are expected for the root group (`File.__init__()`), group creation (`Group.create_group()`) and dataset creation (`Group.create_dataset()`).\n", "\n", "Note, that the standard attributes, which are marked **bold**, are obligatory. The others may or may not be provided during object creation:" ] }, { "cell_type": "code", "execution_count": 1, "id": "d26631cc-21b6-4391-a9ce-2f510f8e5c06", "metadata": {}, "outputs": [], "source": [ "from h5rdmtoolbox import convention\n", "import h5rdmtoolbox as h5tbx" ] }, { "cell_type": "code", "execution_count": 2, "id": "6399d43b-0b37-42b1-bb29-53670b449dd7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "WindowsPath('example_convention.json')" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h5tbx.convention.utils.yaml2json('example_convention.yaml')" ] }, { "cell_type": "code", "execution_count": 3, "id": "8bacf694-33d6-4351-b867-018b0d3458d4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Convention(\"h5rdmtoolbox-tuturial-convention\")" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cv = h5tbx.convention.from_yaml('example_convention.yaml')\n", "# you may also use .from_json('example_convention.json')\n", "cv" ] }, { "cell_type": "markdown", "id": "52e14b1b-58cc-45c5-bb2c-503c54d00f3f", "metadata": {}, "source": [ "In order to make the convention affective in this session, it must be enabled. We do this by calling `use()`:" ] }, { "cell_type": "code", "execution_count": 4, "id": "d4501be9-6f6d-4989-9f93-904cd9b0b909", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "using(\"h5rdmtoolbox-tuturial-convention\")" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h5tbx.use(cv)" ] }, { "cell_type": "markdown", "id": "c3ffe471-98fb-4eb2-a127-3838bd63c139", "metadata": {}, "source": [ "Now, we will get an error if we create a HDF5 file without providing the attribute `contact_id`. As we made it a required attribute, it must be provided during file initialization:" ] }, { "cell_type": "code", "execution_count": 5, "id": "82012389-c61b-409e-b090-b6a44f65e368", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Convention \"h5rdmtoolbox-tuturial-convention\" expects standard attribute \"contact_id\" to be provided as an argument during file creation.\n" ] } ], "source": [ "try:\n", " with h5tbx.File() as h5:\n", " pass\n", "except h5tbx.errors.StandardAttributeError as e:\n", " print(e)" ] }, { "cell_type": "markdown", "id": "7092e18c-3bfa-4d05-a233-b14cc67e3ed2", "metadata": {}, "source": [ "Providing a wrong value raises an error, too:" ] }, { "cell_type": "code", "execution_count": 6, "id": "dae0ee54-c4f6-4920-a7bf-a1ffb2e80bce", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Validation of \"velocity field\" for standard attribute \"comment\" failed.\n", "Expected fields: {'comment': FieldInfo(annotation=str, required=True, metadata=[WrapValidator(func=)])}\n", "Pydantic error: 1 validation error for comment\n", "comment\n", " Parameter error, Invalid format for pattern [type=value_error, input_value='velocity field', input_type=str]\n", " For further information visit https://errors.pydantic.dev/2.5/v/value_error\n" ] } ], "source": [ "try:\n", " with h5tbx.File(contact_id='id1722') as h5:\n", " h5.create_dataset(name='velocity', shape=(3, 4), units='m/s', comment='velocity field')\n", "except h5tbx.errors.StandardAttributeError as e:\n", " print(e)" ] }, { "cell_type": "markdown", "id": "ba22b498-a8ab-49a0-b074-52e3e0f707fa", "metadata": {}, "source": [ "Now, we got it:" ] }, { "cell_type": "code", "execution_count": 7, "id": "ef0071e7-da00-4ba3-b6a0-3c22869677ee", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n", " \n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "with h5tbx.File(contact_id='id1722') as h5:\n", " h5.dump()" ] }, { "cell_type": "markdown", "id": "d7319747-a03b-4bc2-94c7-1b439567be16", "metadata": {}, "source": [ "Note, that if we were to reopen the file not in read-only (r) but in read-write mode, then the standard attributes which already exist are not checked again. So if the HDF5 was written with another package, e.g. h5py, then the value might be wrong:" ] }, { "cell_type": "code", "execution_count": 8, "id": "9d1070ef-2f34-4685-b61a-13757a39d9a4", "metadata": {}, "outputs": [], "source": [ "with h5tbx.File(name=h5.hdf_filename, mode='r+') as h5:\n", " pass # note, that we were not required to pass \"data_type\" as it was present already!" ] }, { "cell_type": "markdown", "id": "de884f21-a639-47d7-b8c9-0e93c875d657", "metadata": {}, "source": [ "Note, that a convention can also be enabled only **temporarily** using the context manager syntax:" ] }, { "cell_type": "code", "execution_count": 9, "id": "3b774696-397f-4c0b-9383-5e843d701759", "metadata": {}, "outputs": [], "source": [ "with h5tbx.use(cv):\n", " with h5tbx.File(contact_id='id1722') as h5:\n", " pass" ] }, { "cell_type": "markdown", "id": "d97cc7ab-3889-455d-832e-64325a361973", "metadata": {}, "source": [ "## Importing/Loading an online convention\n", "\n", "The intended distribution of convention is via online repositories. The YAML file hence should be uploaded such it is accessible to all users. The `h5RDMtoolbox` currently favors the usage of [Zenodo](https://zenodo.org) repositories. The advantages are long-term storage and assignment of a DOI. However, files accessible via an URL can also be downloaded.\n", "\n", "A tutorial convention is published [here](https://zenodo.org/record/8276817). By calling `from_zenodo()` the convention object is created:" ] }, { "cell_type": "code", "execution_count": 10, "id": "7c3da088-a741-4a98-99f6-ab5ffd4fa370", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Convention(\"h5rdmtoolbox-tutorial-convention\")" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cv = h5tbx.convention.from_zenodo(doi_or_recid='10428822')\n", "cv" ] }, { "cell_type": "markdown", "id": "652128cc-7ce4-428d-8809-7150224f40bc", "metadata": {}, "source": [ "## Effect of enabling a convention\n", "\n", "The convention above defined the usage of certain attributes with certain methods. E.g. \"data_type\" is to be used when a HDF5 file is created. When the convention is enabled, the **signature of the respective methods is changed**. To proof this, let's implement a small function, which prints all parameters of a given function and inspect the effect of the convention in the `__init__` method:" ] }, { "cell_type": "code", "execution_count": 11, "id": "7f42a1c5-5f33-4c4d-b9de-5eaf84b84e61", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cv.properties[h5tbx.Dataset]['standard_name']" ] }, { "cell_type": "code", "execution_count": 12, "id": "bac5a1ed-d753-4b83-831c-689962b9b9f4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "no convention: \n", "\n", "Parameters for \"__init__\":\n", " - name\n", " - mode\n", " - attrs\n", " - kwargs\n", "\n", "------------\n", "with convention h5rdmtoolbox-tutorial-convention: (standard attributes are made bold)\n", "\n", "Parameters for \"__init__\":\n", " - name\n", " - mode\n", " - attrs\n", " - \u001b[1mdata_type\u001b[0m\n", " - \u001b[1mstandard_name_table\u001b[0m\n", " - \u001b[1mcomment\u001b[0m\n", " - \u001b[1mcontact\u001b[0m\n", " - \u001b[1mreferences\u001b[0m\n", " - kwargs\n" ] } ], "source": [ "import inspect\n", "\n", "def print_method_parameters(method):\n", " print(f'\\nParameters for \"{method.__name__}\":')\n", " for param in inspect.signature(method).parameters.values():\n", " if not param.name == 'self':\n", " if param.name in h5tbx.convention.get_current_convention().methods[h5tbx.File].get('__init__', {}).keys():\n", " print(f' - {h5tbx._repr.make_bold(param.name)}')\n", " else:\n", " print(f' - {param.name}')\n", "\n", "methods = (h5tbx.File.__init__, h5tbx.Group.create_group, h5tbx.Group.create_dataset)\n", "\n", "print('no convention: ')\n", "h5tbx.use(None)\n", "print_method_parameters(h5tbx.File.__init__)\n", "\n", "print(f'\\n------------\\nwith convention {cv.name}: (standard attributes are made bold)')\n", "h5tbx.use(cv)\n", "print_method_parameters(h5tbx.File.__init__)" ] }, { "cell_type": "code", "execution_count": 13, "id": "d7f8fcb6-3a82-44aa-8297-92ff486722a6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "using(\"h5py\")" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h5tbx.use(None) # fall back to the default convention" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.18" } }, "nbformat": 4, "nbformat_minor": 5 }