Standard Name Convention#

The “Standard Name Convention” is one realization of a convention promoted by the toolbox. It is based on the idea, that every dataset must have a physical unit (or none if it is dimensionless) and that datasets must be identifiable via an identifier attribute rather than the dataset name itself.

The key standard attributes are

  • standard_name: A human- and machine-readable dataset identifier based on construction rules and listed in a “Standard Name Table”,

  • standard_name_table: List of standard_name together with the base unit (SI) and a comprehensive description. It also includes additional information about how a standard_name can be transformed into a new standard_name

  • units: The unit attribute of a dataset. Must not be SI-unit, but must be convertible to it and then match the registered SI-unit in the Standard name table,

  • long_name: An alternative name if no standard_name is applicable.

This concept is first introduced by the Climate and Forecast community and is called CF-convention. The h5RDMtoolbox adopts the concept and implements a general version of it, so that users can define their own discipline- or problem-specific standard name convention.

Main benefits of the convention are:

  • achieving self-describing files, which are human and machine interpretation interpretable,

  • validating correctness of dataset identifiers (standard_name) and their units

  • allowing unit-aware processing of data.

This chapter walks you through the concept and shows how to apply it

import h5rdmtoolbox as h5tbx
import warnings
warnings.filterwarnings('ignore')

from h5rdmtoolbox.convention.standard_names.table import StandardNameTable

Standard Name Tables#

Example 1: cf-convention#

The Standard name table should be defined in documents (typically XML or YAML). The corresponding object then can be initialized by the respective constructor methods (from_yaml, from_web, …).

For reading the original CF-convention table, do the following:

cf = StandardNameTable.from_web("https://cfconventions.org/Data/cf-standard-names/79/src/cf-standard-name-table.xml",
                               known_hash='4c29b5ad70f6416ad2c35981ca0f9cdebf8aab901de5b7e826a940cf06f9bae4')
cf
  • StandardNameTable: (name: cf-standard-name-table.xml, version_number: 79, last_modified: 2022-03-19T15:25:54Z, institution: Centre for Environmental Data Analysis, contact: support@ceda.ac.uk, version: v79.0.0, url: https://cfconventions.org/Data/cf-standard-names/79/src/cf-standard-name-table.xml)
  • The standard names are items of the table object:

    cf['x_wind']
    
      • units : m/s
      • description : "x" indicates a vector component along the grid x-axis, positive with increasing x. Wind is defined as a two-dimensional (horizontal) air velocity vector, with no vertical component. (Vertical motion in the atmosphere has the standard name upward_air_velocity.).
    cf['x_wind'].units
    
    m/s
    cf['x_wind'].description
    
    '"x" indicates a vector component along the grid x-axis, positive with increasing x. Wind is defined as a two-dimensional (horizontal) air velocity vector, with no vertical component. (Vertical motion in the atmosphere has the standard name upward_air_velocity.).'
    

    Example 2: User defined table#

    Initializing standard name tables from a web-resource should be the standard process, because a project or community might defined it and published it under a DOI.

    The h5rdmtoolbox especially supports tables that are published on Zenodo:

    snt = StandardNameTable.from_zenodo(10428795)
    snt
    
    A target folder was specified. Downloading file to this folder: /home/docs/.local/share/h5rdmtoolbox/1.6.1/standard_name_tables
    
  • StandardNameTable: (institution: Karlsruhe Institute of Technology, contact: https://orcid.org/0000-0001-8729-0482, valid_characters: ['^a-zA-Z0-9_'], pattern: ^[0-9 ].*, last_modified: 2023-07-18 09:05:38.112885+00:00, version: v4.1.0-alpha, zenodo_doi: 10428795)
  • Here are the standard names of the table:

    snt.names
    
    ['absolute_pressure',
     'ambient_static_pressure',
     'ambient_temperature',
     'auxiliary_fan_rotational_speed',
     'blade_inlet_angle',
     'blade_inlet_diameter',
     'blade_number',
     'blade_outlet_angle',
     'blade_outlet_diameter',
     'coordinate',
     'density',
     'difference_of_total_pressure_to_static_pressure_between_across_fan',
     'difference_of_wall_static_pressure_across_fan',
     'difference_of_wall_static_pressure_across_orifice',
     'dynamic_pressure',
     'dynamic_viscosity',
     'fan_efficiency',
     'fan_flow_coefficient',
     'fan_inlet_area',
     'fan_outlet_area',
     'fan_power_coefficient',
     'fan_pressure_coefficient',
     'fan_rotational_speed',
     'fan_shaft_power',
     'fan_specific_speed',
     'fan_torque',
     'fan_volume_flow_rate',
     'impeller_diameter',
     'impeller_inlet_width',
     'impeller_outlet_width',
     'impeller_volume_flow_rate',
     'impeller_weight',
     'inner_diameter_of_orifice',
     'kinematic_viscosity',
     'mass_flow_rate',
     'outer_diameter_of_orifice',
     'pulse_delay',
     'relative_humidity',
     'static_pressure',
     'temperature',
     'time',
     'total_pressure',
     'turbulent_kinetic_energy',
     'velocity',
     'vorticity',
     'wall_static_pressure',
     'xx_reynolds_stress',
     'yx_reynolds_stress',
     'yy_reynolds_stress',
     'yz_reynolds_stress',
     'zx_reynolds_stress',
     'zy_reynolds_stress',
     'zz_reynolds_stress']
    

    In a notebook, we can also get a nice overview of the table by calling dump():

    snt.dump()
    
    description units vector alias
    absolute_pressure Pressure is force per unit area. Absolute air pressure is pressure deviation to a total vacuum. Pa NaN NaN
    ambient_static_pressure Static air pressure is the amount of pressure exerted by air that is not moving. Ambient static air pressure is the static air pressure of the surrounding air. Pa NaN NaN
    ambient_temperature Air temperature is the bulk temperature of the air, not the surface (skin) temperature. Ambient air temperature is the temperature of the surrounding air. K NaN NaN
    auxiliary_fan_rotational_speed Number of revolutions of an auxiliary fan. 1/s NaN NaN
    blade_inlet_angle Angle of blade at inlet. rad NaN NaN
    blade_inlet_diameter The inner diameter of the test fan (D1). m NaN NaN
    blade_number The blade number is the number of blades of the test fan. NaN NaN
    blade_outlet_angle Angle of blade at inlet. rad NaN NaN
    blade_outlet_diameter The outer diameter of the test fan (D2). m NaN NaN
    coordinate The spatial coordinate. m True NaN
    density Air density is defined as the mass of air divided by its volume. kg/m**3 NaN NaN
    difference_of_total_pressure_to_static_pressure_between_across_fan The difference of static pressure at fan outlet w.r.t. the total pressure upstream of the fan. The total pressure generally is not known at the fan inlet pipe but further upstream, e.g. in a settling chamber. The dataset must provide detailed information, e.g. referencing to the respective pressure measurement device containing the exact location in the setup. Pa NaN difference_of_total_pressure_to_static_pressure_between_fan_outlet_and_fan_inlet
    difference_of_wall_static_pressure_across_fan Static air pressure is the amount of pressure exerted by air that is not moving. Difference of wall static air pressure across a fan is the difference between the static air pressure downstream (at fan_outlet) of the fan and the total air pressure upstream of the fan at the wall (at fan_inlet). Pa NaN NaN
    difference_of_wall_static_pressure_across_orifice Differnece of static air pressure across orifice to compute volume flow rate according to DIN EN ISO 5167. Pa NaN NaN
    dynamic_pressure Dynamic air pressure is a measure for kinetic energy per unit volume of moving air. Pa NaN NaN
    dynamic_viscosity Dynamic air viscosity indicates the resistance of air towards deformation under shear stress. (https://doi.org/10.1016/B978-0-08-096949-7.00020-0). Pa*s NaN NaN
    fan_efficiency Total fan efficiency as defined in (CAROLUS, Thomas. Ventilatoren-Aerodynamischer Entwurf, Schallvorhersage. Konstruktion, 2013, 2. Jg., p.5, eq.1.16). NaN NaN
    fan_flow_coefficient Air flow coefficient is a dimensionless number as defined in (CAROLUS, Thomas. Ventilatoren-Aerodynamischer Entwurf, Schallvorhersage. Konstruktion, 2013, 2. Jg., p.2, eq.1.3). The addition "of_fan" indicates that this coefficient applies to the deployed fan. NaN NaN
    fan_inlet_area The fan cross-sectional area at the location "fan_inlet" for fans with a casing. The position of the referred cross-sectional area is in the pipe upstream of the fan. The area is generally taken to compute the dynamic pressure at the inlet of the fan based on the volume flow rate. m**2 NaN NaN
    fan_outlet_area The fan cross-sectional area at the location "fan_outlet" for fans with a casing. The position of the referred cross-sectional area is in the pipe downstream of the fan. The area is generally taken to compute the dynamic pressure at the outlet of the fan based on the volume flow rate. m**2 NaN NaN
    fan_power_coefficient Power coefficient is a dimensionless number as defined in (CAROLUS, Thomas. Ventilatoren-Aerodynamischer Entwurf, Schallvorhersage. Konstruktion, 2013, 2. Jg., p.2, eq.1.5). The addition "of_fan" indicates that this coefficient applies for the deployed fan. NaN NaN
    fan_pressure_coefficient Total pressure coefficient is a dimensionless number as defined in (CAROLUS, Thomas. Ventilatoren-Aerodynamischer Entwurf, Schallvorhersage. Konstruktion, 2013, 2. Jg., p.2, eq.1.4). The addition "of_fan" indicates that this coefficient applies for the deployed fan. NaN NaN
    fan_rotational_speed Number of revolutions of the test fan. 1/s NaN NaN
    fan_shaft_power Power of fan drive shaft. W NaN NaN
    fan_specific_speed Specific speed of the fan as defined in (CAROLUS, Thomas. Ventilatoren-Aerodynamischer Entwurf, Schallvorhersage. Konstruktion, 2013, 2. Jg., p.2, eq.1.6). NaN NaN
    fan_torque The torque acting on the impeller of the fan. Nm NaN NaN
    fan_volume_flow_rate Air volume flow rate is the volume of air that passes a cross section per unit time. The volume flow rate of the fan is the volume flow entering and leaving the fan. Due to gaps between the impeller and the housing, the volume flow rate is lower than the volume flow rate through the impeller (see impeller_volume_flow_rate). m**3/s NaN NaN
    impeller_diameter The diameter of the impeller of the test fan, also D3. For some fans D2 is equal to D3. m NaN NaN
    impeller_inlet_width The width of the impeller inlet. m NaN NaN
    impeller_outlet_width The width of the impeller outlet. m NaN NaN
    impeller_volume_flow_rate Air volume flow rate is the volume of air that passes a cross section per unit time. The volume flow rate of the impeller is the volume flow entering and leaving the impeller. Due to gaps between the impeller and the housing, this volume flow rate is higher than the volume flow rate through the fan (see fan_volume_flow_rate). m**3/s NaN NaN
    impeller_weight Weight of the impeller. kg NaN NaN
    inner_diameter_of_orifice Inner diameter of an orifice. m NaN NaN
    kinematic_viscosity Dynamic air viscosity indicates the resistance of air towards deformation under shear stress. Kinematic viscosity. Dynamic air viscosity divided by air denisity equals kinematic air viscosity. (https://doi.org/10.1016/B978-0-12-410461-7.00007-9). m**2/s NaN NaN
    mass_flow_rate Air mass flow rate is the mass of air that passes a certain cross sectiont per unit time. kg/s NaN NaN
    outer_diameter_of_orifice Outer diameter of an orifice. m NaN NaN
    pulse_delay Time between two laser pulses. s NaN NaN
    relative_humidity Relative humidity is a measure of the water vapor content of air. NaN NaN
    static_pressure Static air pressure is the amount of pressure exerted by air that is not moving. Pa NaN NaN
    temperature Air temperature is the bulk temperature of the air, not the surface (skin) temperature. (CF Conventions). degC NaN NaN
    time Recording time since start of experiment. s NaN NaN
    total_pressure The sum of dynamic and static air pressure. Pa NaN NaN
    turbulent_kinetic_energy The kinetic energy per unit mass of a fluid. m**2/s**2 NaN NaN
    velocity Velocity. m/s True NaN
    vorticity Vorticity. 1/s True NaN
    wall_static_pressure Static air pressure is the amount of pressure exerted by air that is not moving. Wall static air pressure is the static air pressure at the wall. Pa NaN NaN
    xx_reynolds_stress Reynolds stress is a tensor quantity. "xx" indicates that the variations of x-velocity is used. m**2/s**2 NaN NaN
    yx_reynolds_stress Reynolds stress is a tensor quantity. "yx" indicates that the variations of x- and y-velocity are used. m**2/s**2 NaN NaN
    yy_reynolds_stress Reynolds stress is a tensor quantity. "yy" indicates that the variations of y-velocity is used. m**2/s**2 NaN NaN
    yz_reynolds_stress Reynolds stress is a tensor quantity. "yz" indicates that the variations of y- and z-velocity are used. m**2/s**2 NaN NaN
    zx_reynolds_stress Reynolds stress is a tensor quantity. "zx" indicates that the variations of z- and x-velocity are used. m**2/s**2 NaN NaN
    zy_reynolds_stress Reynolds stress is a tensor quantity. "zy" indicates that the variations of z- and y-velocity are used. in y-axis direction. m**2/s**2 NaN NaN
    zz_reynolds_stress Reynolds stress is a tensor quantity. "zy" indicates that the variations of z-velocity is used. m**2/s**2 NaN NaN

    Transformation of base standard names#

    Not all allowed standard names must be included in the table. There are some so-called transformations of the listed ones. There are two ways to transform a standard name.

    1. Using affixes: Adding a prefix or a suffix

    2. Apply a mathematical operation to the name

    1. Adding affixes#

    Note, that ‘x_velocity’ is not part of the table:

    'x_velocity' in snt
    
    False
    

    … but ‘velocity’ is. And it is a vector. The vector property tells us, if we can add a “vector component name” as a prefix, e.g. a “x” or “y”:

    snt['velocity'].is_vector()
    
    True
    

    Which vector component exist, are defined in the table:

    snt.affixes['component'].values
    
    {'x': 'X indicates the x-axis component of the vector.',
     'y': 'Y indicates the y-axis component of the vector.',
     'z': 'Z indicates the z-axis component of the vector.'}
    

    Thus, by indexing “x_velocity” the table checks whether the prefix is valid and if yes returns the new (transformed) standard name:

    snt['x_velocity']
    
      • units : m/s
      • description : Velocity. X indicates the x-axis component of the vector.

    Apply a mathematical operation#

    During processing of data, often times datasets are transformed in with mathematical function like taking the square or applying a derivative of one quantity with respect to (wrt) another one. Some mathemtaical operations like these are supported in the version, e.g.:

    snt['derivative_of_x_velocity_wrt_x_coordinate']
    
      • units : 1/s
      • description : Derivative of x_velocity with respect to x_coordinate. Velocity. X indicates the x-axis component of the vector. The spatial coordinate. X indicates the x-axis component of the vector.
    snt['square_of_static_pressure']
    
      • units : Pa**2
      • description : Square of static_pressure. Static air pressure is the amount of pressure exerted by air that is not moving.
    snt['arithmetic_mean_of_static_pressure']
    
      • units : Pa
      • description : Arithmetic mean of static_pressure. Static air pressure is the amount of pressure exerted by air that is not moving.

    Usage with HDF5 files#

    Let’s apply the convention to HDF5 files. We lazyly take the existing tutorial convention and remove some standard attributes in order to limit the example to the relevant attributes of the standard name convention:

    zenodo_cv = h5tbx.convention.from_zenodo('https://zenodo.org/record/8357399')
    sn_cv = zenodo_cv.pop('contact', 'comment', 'references', 'data_type')
    sn_cv.name = 'standard name convention'
    sn_cv.register()
    
    h5tbx.use(sn_cv)
    sn_cv
    
    A target folder was specified. Downloading file to this folder: /home/docs/.local/share/h5rdmtoolbox/1.6.1/cache
    
    Convention("standard name convention")
    

    Find out about the available standard names: We do this by creating a file and retrieving the attributestandard_name_table. Based on the convention, it is set by default, so it is available without explicitly setting it:

    with h5tbx.File() as h5:
        snt = h5.standard_name_table
    
    print('The available (base) standard names are: ', snt.names)
    
    The available (base) standard names are:  ['coordinate', 'static_pressure', 'time', 'velocity']
    

    One possible dataset based on the standard name table could be “x_velocity”. This is possible, because component is available in the list of affixes. Based on the transformation pattern, it is clear the “component” is a prefix. “x” is within the available components, so “x_velocity” is a valid transformed standard name from the given table:

    print('Available affixes: ', snt.affixes.keys())
    
    print('\nValues for the component prefix:')
    snt.affixes['component']
    
    Available affixes:  dict_keys(['device', 'location', 'reference_frame', 'component'])
    
    Values for the component prefix:
    
    <Affix: name="component", description="Components are prefixes to the standard_name, e.g. x_velocity." transformation_pattern=^(.*)_(.*)$, values=['x', 'y', 'z']>
    

    Let’s access the name from the table. It exists and the description is adjusted, too:

    snt['x_velocity']
    
      • units : m/s
      • description : Velocity refers to the change of position over time. Velocity is a vector quantity. X indicates the x-axis component of the vector.

    Creating a x-velocity dataset:

    with h5tbx.File() as h5:
        h5.create_dataset('u', data=[1,2,3], standard_name='x_velocity', units='km/s')
        h5.dump()
    
      • standard_name_table: DOI
        (3) [int64]
        • standard_name: x_velocity
        • units: km/s

    Usage with HDF5 files (update)#

    from ontolutils import SSNO
    
    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    Cell In[21], line 1
    ----> 1 from ontolutils import SSNO
    
    ImportError: cannot import name 'SSNO' from 'ontolutils' (/home/docs/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/envs/v1.6.1/lib/python3.8/site-packages/ontolutils/__init__.py)
    
    with h5tbx.File(mode='w') as h5:
        ds = h5.create_dataset('u', data=3)
        ds.attrs['standard_name', SSNO.hasStandardName] = 'x_velocity'
        ds.rdf.object['standard_name'] = SSNO.StandardName  # https://matthiasprobst.github.io/ssno#StandardName
        
        ds = h5.create_dataset('v', data=3)
        ds.attrs['standard_name', SSNO.hasStandardName] = 'y_velocity'
        ds.rdf.object['standard_name'] = SSNO.StandardName  # https://matthiasprobst.github.io/ssno#StandardName
        h5.dump(collapsed=False)
    
    hdf_filename = h5.hdf_filename