Standard Name Convention#
The “Standard Name Convention” is one realization of a convention promoted by the toolbox. It is based on the idea, that every dataset must have a physical unit (or none if it is dimensionless) and that datasets must be identifiable via an identifier attribute rather than the dataset name itself.
The key standard attributes are
standard_name: A human- and machine-readable dataset identifier based on construction rules and listed in a “Standard Name Table”,standard_name_table: List ofstandard_nametogether with the base unit (SI) and a comprehensive description. It also includes additional information about how astandard_namecan be transformed into a newstandard_nameunits: The unit attribute of a dataset. Must not be SI-unit, but must be convertible to it and then match the registered SI-unit in the Standard name table,long_name: An alternative name if nostandard_nameis applicable.
This concept is first introduced by the Climate and Forecast community and is called CF-convention. The h5RDMtoolbox adopts the concept and implements a general version of it, so that users can define their own discipline- or problem-specific standard name convention.
Main benefits of the convention are:
achieving self-describing files, which are human and machine interpretation interpretable,
validating correctness of dataset identifiers (standard_name) and their units
allowing unit-aware processing of data.
This chapter walks you through the concept and shows how to apply it
import h5rdmtoolbox as h5tbx
import warnings
warnings.filterwarnings('ignore')
from h5rdmtoolbox.convention.standard_names.table import StandardNameTable
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import h5rdmtoolbox as h5tbx
2 import warnings
3 warnings.filterwarnings('ignore')
File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/checkouts/v1.7.0/h5rdmtoolbox/__init__.py:129
125 with File(src) as h5:
126 return h5.dumps()
--> 129 from h5rdmtoolbox.wrapper.ld.hdf.file import get_ld as hdf_get_ld
130 from h5rdmtoolbox.wrapper.ld.user.file import get_ld as user_get_ld
133 def get_ld(
134 hdf_filename: Union[str, pathlib.Path],
135 structural: bool = True,
136 semantic: bool = True,
137 blank_node_iri_base: Optional[str] = None,
138 **kwargs) -> rdflib.Graph:
File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/checkouts/v1.7.0/h5rdmtoolbox/wrapper/ld/__init__.py:1
----> 1 import ssnolib.ssno.standard_name
2 from ontolutils.namespacelib import M4I
3 from ontolutils.namespacelib import SCHEMA
ModuleNotFoundError: No module named 'ssnolib'
Standard Name Tables#
Example 1: cf-convention#
The Standard name table should be defined in documents (typically XML or YAML). The corresponding object then can be initialized by the respective constructor methods (from_yaml, from_web, …).
For reading the original CF-convention table, do the following:
cf = StandardNameTable.from_web("https://cfconventions.org/Data/cf-standard-names/79/src/cf-standard-name-table.xml",
known_hash='4c29b5ad70f6416ad2c35981ca0f9cdebf8aab901de5b7e826a940cf06f9bae4')
cf
The standard names are items of the table object:
cf['x_wind']
cf['x_wind'].units
cf['x_wind'].description
Example 2: User defined table#
Initializing standard name tables from a web-resource should be the standard process, because a project or community might defined it and published it under a DOI.
The h5rdmtoolbox especially supports tables that are published on Zenodo:
snt = StandardNameTable.from_zenodo(10428795)
snt
Here are the standard names of the table:
snt.names
In a notebook, we can also get a nice overview of the table by calling dump():
snt.dump()
Transformation of base standard names#
Not all allowed standard names must be included in the table. There are some so-called transformations of the listed ones. There are two ways to transform a standard name.
Using affixes: Adding a prefix or a suffix
Apply a mathematical operation to the name
1. Adding affixes#
Note, that ‘x_velocity’ is not part of the table:
'x_velocity' in snt
… but ‘velocity’ is. And it is a vector. The vector property tells us, if we can add a “vector component name” as a prefix, e.g. a “x” or “y”:
snt['velocity'].is_vector()
Which vector component exist, are defined in the table:
snt.affixes['component'].values
Thus, by indexing “x_velocity” the table checks whether the prefix is valid and if yes returns the new (transformed) standard name:
snt['x_velocity']
Apply a mathematical operation#
During processing of data, often times datasets are transformed in with mathematical function like taking the square or applying a derivative of one quantity with respect to (wrt) another one. Some mathemtaical operations like these are supported in the version, e.g.:
snt['derivative_of_x_velocity_wrt_x_coordinate']
snt['square_of_static_pressure']
snt['arithmetic_mean_of_static_pressure']
Usage with HDF5 files#
Let’s apply the convention to HDF5 files. We lazyly take the existing tutorial convention and remove some standard attributes in order to limit the example to the relevant attributes of the standard name convention:
zenodo_cv = h5tbx.convention.from_zenodo('https://zenodo.org/record/8357399')
sn_cv = zenodo_cv.pop('contact', 'comment', 'references', 'data_type')
sn_cv.name = 'standard name convention'
sn_cv.register()
h5tbx.use(sn_cv)
sn_cv
Find out about the available standard names: We do this by creating a file and retrieving the attributestandard_name_table. Based on the convention, it is set by default, so it is available without explicitly setting it:
with h5tbx.File() as h5:
snt = h5.standard_name_table
print('The available (base) standard names are: ', snt.names)
One possible dataset based on the standard name table could be “x_velocity”. This is possible, because component is available in the list of affixes. Based on the transformation pattern, it is clear the “component” is a prefix. “x” is within the available components, so “x_velocity” is a valid transformed standard name from the given table:
print('Available affixes: ', snt.affixes.keys())
print('\nValues for the component prefix:')
snt.affixes['component']
Let’s access the name from the table. It exists and the description is adjusted, too:
snt['x_velocity']
Creating a x-velocity dataset:
with h5tbx.File() as h5:
h5.create_dataset('u', data=[1,2,3], standard_name='x_velocity', units='km/s')
h5.dump()
Usage with HDF5 files (update)#
from ssnolib import SSNO
with h5tbx.File(mode='w') as h5:
ds = h5.create_dataset('u', data=3)
ds.attrs['standard_name', SSNO.hasStandardName] = 'x_velocity'
ds.rdf.object['standard_name'] = SSNO.StandardName # https://matthiasprobst.github.io/ssno#StandardName
ds = h5.create_dataset('v', data=3)
ds.attrs['standard_name', SSNO.hasStandardName] = 'y_velocity'
ds.rdf.object['standard_name'] = SSNO.StandardName # https://matthiasprobst.github.io/ssno#StandardName
h5.dump(collapsed=False)
hdf_filename = h5.hdf_filename