EngMeta

EngMeta#

EngMeta is a metadata Schema developed as part of the NFDI4Ing (Nationale Forschungsdaten Infrastruktur für Ingenieure).

The schema is written a .xsd-file (XML Schema file), however, the toolbox does not provide a converter into YAML file in order to use the schema as a convention. In this example, most mandatory and required fields of the EngMeta schema are manually translated into a convention-YAML file. The below table gives an overview of fields. Note, that not all fields are put into the YAML file as numerous attributes like the file size, file type etc. can be derived from the file itself and don’t need to be provided by the user.

The fields are:

Category	Title	Standard Attribute Name	Data Type	Obligation
Descriptive Metadata	Contact Person	contact	`$personOrOrganization`	M
	Producer/Author	creator	`$personOrOrganization`	M
	Contributor	contributor	`$personOrOrganization`	O
	Title	title	`$str`	M
	Description	description	`$str`	O
	Keywords	keywords	`$str`	R
	Subject	subject	`$str`	R
	Dates (Creation, Publication, …)	dates	`$str`	R
Process Metadata	Provenance information	provenance	`$processingStep`	O
Technical Metadata	PID	identifier	`$pid`	M
	Legal Information	rightsStatement	`$rightsStatement`	R

import h5rdmtoolbox as h5tbx
cv = h5tbx.convention.from_yaml('EngMeta.yaml')
cv

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 import h5rdmtoolbox as h5tbx
      2 cv = h5tbx.convention.from_yaml('EngMeta.yaml')
      3 cv

File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/checkouts/v1.7.0/h5rdmtoolbox/__init__.py:129
    125     with File(src) as h5:
    126         return h5.dumps()
--> 129 from h5rdmtoolbox.wrapper.ld.hdf.file import get_ld as hdf_get_ld
    130 from h5rdmtoolbox.wrapper.ld.user.file import get_ld as user_get_ld
    133 def get_ld(
    134         hdf_filename: Union[str, pathlib.Path],
    135         structural: bool = True,
    136         semantic: bool = True,
    137         blank_node_iri_base: Optional[str] = None,
    138         **kwargs) -> rdflib.Graph:

File ~/checkouts/readthedocs.org/user_builds/h5rdmtoolbox/checkouts/v1.7.0/h5rdmtoolbox/wrapper/ld/__init__.py:1
----> 1 import ssnolib.ssno.standard_name
      2 from ontolutils.namespacelib import M4I
      3 from ontolutils.namespacelib import SCHEMA

ModuleNotFoundError: No module named 'ssnolib'

h5tbx.use(cv)

using("EngMeta")

contact = cv.registered_standard_attributes['contact']
contact.validator.model_fields

{'contributor': FieldInfo(annotation=personOrOrganization, required=True)}

contact.validate(
    {'name': 'Matthias Probst',
     'id': 'https://orcid.org/0000-0001-8729-0482',
     'role': 'Researcher'}
)

True

contact.validate(
    {'name': 'Matthias Probst',
     'role': 'Invalid Role'}
)

False

contact.validate({'name': 'Matthias Probst'})

True

with h5tbx.File(contact=dict(name='Matthias Probst'),
                creator=dict(name='Matthias Probst',
                             id='https://orcid.org/0000-0001-8729-0482',
                             role='Researcher'
                             ),
                pid=dict(id='123', type='other'),
                title='Test file to demonstrate usage of EngMeta schema') as h5:
    fname = h5.hdf_filename
    h5.dump()

/(0)
- contact : {"name": "Matthias Probst"}
- creator : {"name": "Matthias Probst", "id": "https://orcid.org/0000-0001-8729-0482", "role": "Researcher"}
- pid : {"id": "123", "type": "other"}
- title : Test file to demonstrate usage of EngMeta schema

Mapping functions#

Some metadata can be extracted automatically from the file, like the file size, file type and check sum for example. Such functions are needed, if metadata like this is required:

import hashlib
def extract_metadata(filename):
    with h5tbx.File(filename) as h5:
        fsize = h5.filesize
    
    return dict(file_size=fsize, file_type='hdf5', checksum=hashlib.md5(open(fname, 'rb').read()).hexdigest())

extract_metadata(fname)

{'file_size': array(6944) <Unit('byte')>,
 'file_type': 'hdf5',
 'checksum': '4fc13072171dbb2a68a2cf4249f38565'}

EngMeta

Contents

EngMeta#

Mapping functions#