Glossary#

metadata#

“Information about data” Michener [Mic06] o higher level descriptions of data sets. In HDF5 files, attributes are used to describe data. Standardized attribute names like long_name or standard_name are special meta data descriptors that follow a specific standard and allow automated exploration and analysis.

convention#

Set of “standard attributes” used to describe data. A convention can be enabled, which will automatically add the standard attributes as parameters to the methods like e.g. create_dataset.

standard attributes#

Attributes that are used to describe data. Standard attributes are defined by a convention. Standard attributes validate the user input, which is done using the pydantic package.

layout#

Layouts define the structure of an HDF5 file. It may define exact content, e.g. attribute name and value or define expected dataset dimensions or shape. It cannot specify the array data of datasets.

repository#

A repository is a storage place for data, usually online, which assigns a unique identifier to the uploaded data. An popular example is Zenodo. Typically, a repository can be queried for metadata such as author, title, description, type of data, but not for the content of the data (see database).

database#

A database hosts data and allows querying of the data content. Examples for databases in the context of HDF5 is MongoDB. Databases allow complex queries that would be slow or impossible on the raw HDF5 files.

FAIR Principles and h5rdmtoolbox#

The FAIR principles (Findable, Accessible, Interoperable, Reusable) provide guidelines for making data more reusable. Below is a mapping of how h5rdmtoolbox features support each principle.

Findable#

F1: (Meta)data are assigned a globally unique and persistent identifier

h5rdmtoolbox features:
  • ORCID integration for author identification

  • DOI assignment via Zenodo upload

  • IRI/URI support for all metadata elements

F2: Data are described with rich metadata

h5rdmtoolbox features:
  • Standard attributes with conventions

  • xarray integration preserves context

  • Automatic metadata collection during file creation

F3: Metadata clearly and explicitly include the identifier of the data they describe

h5rdmtoolbox features:
  • Automatic linking of metadata to datasets via RDF triples

  • Subject/predicate/object structure for all attributes

F4: (Meta)data are registered or indexed in a searchable resource

h5rdmtoolbox features:
  • JSON-LD export for search engine indexing

  • MongoDB integration for metadata search

  • Zenodo upload with rich metadata

Accessible#

A1: (Meta)data are retrievable by their identifier using a standardized communications protocol

h5rdmtoolbox features:
  • Zenodo repository integration

  • FileDB for local file search

A2: Metadata are accessible, even when the data are no longer available

h5rdmtoolbox features:
  • JSON-LD export captures all semantic information

  • Separate metadata files can be generated independently

Interoperable#

I1: (Meta)data use a formal, accessible, shared, and broadly applicable language

h5rdmtoolbox features:
  • RDF/JSON-LD for semantic interoperability

  • Standard attribute conventions

I2: (Meta)data use vocabularies that follow FAIR principles

h5rdmtoolbox features:
  • QUDT unit ontology for physical quantities

  • FOAF ontology for person descriptions

  • M4I (metadata4ing) ontology for experimental metadata

  • Custom ontology integration support

I3: (Meta)data include qualified references to other (meta)data

h5rdmtoolbox features:
  • RDF triple linking between datasets

  • Provenance tracking with references

  • Cross-file references

Reusable#

R1: (Meta)data are richly described with a plurality of accurate and relevant attributes

h5rdmtoolbox features:
  • Standard attributes with validators

  • Conventions with domain-specific rules

  • Provenance and processing information

R1.1: (Meta)data are released with a clear and accessible data usage license

h5rdmtoolbox features:
  • License attribute support

  • Zenodo integration handles licensing automatically

R1.2: (Meta)data are associated with detailed provenance

h5rdmtoolbox features:
  • Processing step tracking

  • RDF provenance ontology support

  • Version history preservation

R1.3: (Meta)data meet domain-relevant community standards

h5rdmtoolbox features:
  • NeXus format support for beamline data

  • Custom convention creation

  • SHACL validation against domain shapes