Zenodo#

The Zenodo repository is a concrete implementation of the RepositoryInterface. Other repositories such as Figshare (https://figshare.com/) could be possible future realizations of it.

Zenodo provides a sandbox (testing environment) and a production environment. They work the same in principle. Therefore, only one implementation is needed, which is ZenodoRecord (the interface to a record in Zenodo). Pass sandbox=True to use the testing environment.

The below diagram shows the abstract base class with its abstract methods (indicated by italics). Note, that upload_file is not abstract. It depends on the implementation of __upload_file__ in the subclasses, which uploads a file to the repository record. upload_file() is basically a wrapper, which additionally allows generating metadata files of the uploaded files. We will explore this feature later in this section.

The RepositoryInterface further defines the communication with files. A file object RepositoryFile is implemented, providing mandatory properties as well as a download method. A repository implementation (just like the one for Zenodo) must return a Dictionary of RepositoryFile objects for the files class property (see source code for in-depth explanation and the example at the end of this section).

../../_static/repo_class_diagram.svg

Example usage#

The example below will upload an HDF file to the sandbox server:

from h5rdmtoolbox.repository import zenodo
import h5rdmtoolbox as h5tbx

1. Init a Repo:#

As said, we use the testing interface, hence sandbox=True:

repo = zenodo.ZenodoRecord(None, sandbox=True)

We create a test HDF5 file, which we will later publish in the repository:

with h5tbx.File() as h5:
    h5.create_dataset('velocity', shape=(10, 30), attrs={'units': 'm/s'})
filename = h5.hdf_filename

2. Add repository metadata#

The repository needs metadata. The Zenodo module has a special class Metadata for this purpose. It validates the data expected by the Zenodo API (For required and optional fields, please refer to the API or carefully read the Metadata docstring. However, as pydantic is used as parent class, invalid or missing parameters will lead to errors):

from h5rdmtoolbox.repository.zenodo import metadata
from datetime import datetime

meta = metadata.Metadata(
    version="0.1.0-rc.1+build.1",
    title='[deleteme]h5tbxZenodoInterface',
    description='A toolbox for managing HDF5-based research data management',
    creators=[metadata.Creator(name="Probst, Matthias",
                      affiliation="KIT - ITS",
                      orcid="0000-0001-8729-0482")],
    contributors=[metadata.Contributor(name="Probst, Matthias",
                              affiliation="KIT - ITS",
                              orcid="0000-0001-8729-0482",
                              type="ContactPerson")],
    upload_type='image',
    image_type='photo',
    access_right='open',
    keywords=['hdf5', 'research data management', 'rdm'],
    publication_date=datetime.now(),
    embargo_date='2020'
)

… finally make the changes effective by setting the metadata:

repo.set_metadata(meta)

3. Upload files#

Any file can be added (uploaded) by calling upload_file(...). It can be a simple text, CSV or binary file. Often, it is advisable to describe the content in an additional file and hence provide more (machine-interpretable) information. Best is, to use JSON-LD files for this. The JSON-LD format allows describing file content and context in a standardized way.

One of the parameters of upload_file(...) is metamapper. It expects a function, that extracts meta information from the input file. If the parameter auto_map_hdf is True and a HDF5 file is passed (scans for file suffixes .hdf, .hdf5 and .h5), the built-in converter function will be called, which writes a JSON-LD file.

By providing the metamapper-function, the target file and its metadata filename (which the function created) will be uploaded together.

Adding a metadata file is especially beneficial for large, binary files. Like this, the metadata file can be downloaded and explored quickly by the user.

repo.upload_file(filename)

List the just uploaded files in the repository:

repo.files
{'tmp0.jsonld': RepositoryFile(tmp0.jsonld),
 'tmp0.hdf': RepositoryFile(tmp0.hdf)}

3b Custom metamapper#

We could of course write and use our own metadata extract function like so:

import pathlib

def my_meta_mapper(filename):
    """very primitive...and not a jsonld file, but 
    servese the demonstrating purpose."""
    txt_filename = pathlib.Path(filename).with_suffix('.txt')
    with open(txt_filename, 'w') as f:
        f.write(f'filename: {filename}')
    return txt_filename
repo.upload_file(filename, metamapper=my_meta_mapper)

Proof, that it worked:

for file in repo.files:
    print(file)
tmp0.hdf
tmp0.jsonld
tmp0.txt

Extract a dcat:Dataset#

The Zenodo Interface allows generating Metadata as dcat:Dataset

dataset = repo.as_dcat_dataset()
print(dataset.serialize("ttl"))
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix schema: <https://schema.org/> .
@prefix spdx: <http://spdx.org/rdf/terms#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://orcid.org/0000-0001-8729-0482> a prov:Person ;
    schema:affiliation [ a prov:Organization ;
            foaf:name "KIT - ITS" ] .

<https://sandbox.zenodo.org/api/deposit/depositions/409269> a dcat:Dataset ;
    dcterms:accessRights "open" ;
    dcterms:creator <https://orcid.org/0000-0001-8729-0482> ;
    dcterms:description "A toolbox for managing HDF5-based research data management" ;
    dcterms:identifier "None" ;
    dcterms:issued "2025-11-27T10:51:56.847039+00:00"^^xsd:dateTime ;
    dcterms:license <https://creativecommons.org/licenses/by/4.0/> ;
    dcterms:modified "2025-11-27T10:52:02.747062+00:00"^^xsd:dateTime ;
    dcterms:publisher <https://www.wikidata.org/wiki/Q22661177> ;
    dcterms:title "[deleteme]h5tbxZenodoInterface" ;
    dcat:distribution <https://sandbox.zenodo.org/api/deposit/depositions/409269/files/0c82b430-c0c0-4495-b7f9-c8dd053b4429>,
        <https://sandbox.zenodo.org/api/deposit/depositions/409269/files/14bca485-e6e6-4d27-b01b-7b4d00cecccd>,
        <https://sandbox.zenodo.org/api/deposit/depositions/409269/files/97a19430-877c-467d-8550-4c5636b6b212> ;
    dcat:keyword "hdf5",
        "rdm",
        "research data management" ;
    dcat:landingPage <https://sandbox.zenodo.org/api/deposit/depositions/409269> ;
    dcat:version "0.1.0-rc.1+build.1" .

<https://www.wikidata.org/wiki/Q22661177> a foaf:Organization ;
    foaf:homepage <https://zenodo.org/> ;
    foaf:name "Zenodo" .

<https://sandbox.zenodo.org/api/deposit/depositions/409269/files/0c82b430-c0c0-4495-b7f9-c8dd053b4429> a dcat:Distribution ;
    dcterms:title "tmp0.jsonld" ;
    spdx:checksum [ a spdx:Checksum ;
            spdx:algorithm <https://spdx.org/rdf/terms#checksumAlgorithm_md5> ;
            spdx:checksumValue "9bb62764474c4976045f8d2d55e292f2" ] ;
    dcat:accessURL <https://sandbox.zenodo.org/api/deposit/depositions/409269/files/0c82b430-c0c0-4495-b7f9-c8dd053b4429> ;
    dcat:byteSize 1849 ;
    dcat:downloadURL <https://sandbox.zenodo.org/api/records/409269/draft/files/tmp0.jsonld/content> ;
    dcat:mediaType <https://www.iana.org/assignments/media-types/application/ld+json> .

<https://sandbox.zenodo.org/api/deposit/depositions/409269/files/14bca485-e6e6-4d27-b01b-7b4d00cecccd> a dcat:Distribution ;
    dcterms:title "tmp0.txt" ;
    spdx:checksum [ a spdx:Checksum ;
            spdx:algorithm <https://spdx.org/rdf/terms#checksumAlgorithm_md5> ;
            spdx:checksumValue "977e3a0081406e77f537e75d45e02dba" ] ;
    dcat:accessURL <https://sandbox.zenodo.org/api/deposit/depositions/409269/files/14bca485-e6e6-4d27-b01b-7b4d00cecccd> ;
    dcat:byteSize 71 ;
    dcat:downloadURL <https://sandbox.zenodo.org/api/records/409269/draft/files/tmp0.txt/content> ;
    dcat:mediaType <https://www.iana.org/assignments/media-types/text/plain> .

<https://sandbox.zenodo.org/api/deposit/depositions/409269/files/97a19430-877c-467d-8550-4c5636b6b212> a dcat:Distribution ;
    dcterms:title "tmp0.hdf" ;
    spdx:checksum [ a spdx:Checksum ;
            spdx:algorithm <https://spdx.org/rdf/terms#checksumAlgorithm_md5> ;
            spdx:checksumValue "cfcba48edfcdea042e6570e5387caf60" ] ;
    dcat:accessURL <https://sandbox.zenodo.org/api/deposit/depositions/409269/files/97a19430-877c-467d-8550-4c5636b6b212> ;
    dcat:byteSize 6944 ;
    dcat:downloadURL <https://sandbox.zenodo.org/api/records/409269/draft/files/tmp0.hdf/content> ;
    dcat:mediaType <https://www.iana.org/assignments/media-types/application/x-hdf> .