{ "cells": [ { "cell_type": "markdown", "id": "834332c9-e420-4802-8d38-eebdd66d8349", "metadata": {}, "source": [ "# Zenodo\n", "\n", "The [Zenodo](https://zenodo.org/) repository is a concrete implementation of the `RepositoryInterface`. Other repositories such as `Figshare` (https://figshare.com/) could be possible future realizations of it.\n", "\n", "Zenodo provides a sandbox (testing environment) and a production environment. They work the same in principle. Therefore, only one implementation is needed, which is `ZenodoRecord` (the interface to a record in Zenodo). Pass `sandbox=True` to use the testing environment.\n", "\n", "The below diagram shows the abstract base class with its abstract methods (indicated by italics). Note, that `upload_file` is *not* abstract. It depends on the implementation of `__upload_file__` in the subclasses, which uploads a file to the repository record. `upload_file()` is basically a wrapper, which additionally allows generating metadata files of the uploaded files. We will explore this feature later in this section.\n", "\n", "The `RepositoryInterface` further defines the communication with files. A file object `RepositoryFile` is implemented, providing mandatory properties as well as a download method. A repository implementation (just like the one for Zenodo) must return a Dictionary of `RepositoryFile` objects for the `files` class property (see source code for in-depth explanation and the example at the end of this section).\n", "\n", "\"../../_static/repo_class_diagram.svg\"\n",\n", " " ] }, { "cell_type": "markdown", "id": "1bd504f1-59a1-45f6-805e-89700ebf4c32", "metadata": {}, "source": [ "## Example usage\n", "\n", "The example below will upload an HDF file to the sandbox server:" ] }, { "cell_type": "code", "execution_count": 1, "id": "b001b211-4771-48b2-8a0f-0a0b75b3de01", "metadata": {}, "outputs": [], "source": [ "from h5rdmtoolbox.repository import zenodo\n", "import h5rdmtoolbox as h5tbx" ] }, { "cell_type": "markdown", "id": "ac9d7063-43af-42d2-94b7-cf20c5b0e09c", "metadata": {}, "source": [ "### 1. Init a Repo:\n", "\n", "As said, we use the testing interface, hence `sandbox=True`:" ] }, { "cell_type": "code", "execution_count": 2, "id": "b1d04f69-5da4-40e8-9669-580bb4666cb0", "metadata": {}, "outputs": [], "source": [ "repo = zenodo.ZenodoRecord(None, sandbox=True)" ] }, { "cell_type": "markdown", "id": "6303c7d5-684d-4bcf-a17a-217c63d26212", "metadata": {}, "source": [ "We create a test HDF5 file, which we will later publish in the repository:" ] }, { "cell_type": "code", "execution_count": 3, "id": "3811ddda-5023-405d-b109-77b342c568b0", "metadata": {}, "outputs": [], "source": [ "with h5tbx.File() as h5:\n", " h5.create_dataset('velocity', shape=(10, 30), attrs={'units': 'm/s'})\n", "filename = h5.hdf_filename" ] }, { "cell_type": "markdown", "id": "70992ae8-c2d2-4263-b315-203c17325f92", "metadata": {}, "source": [ "### 2. Add repository metadata\n", "The repository needs **metadata**. The Zenodo module has a special class `Metadata` for this purpose. It validates the data expected by the Zenodo API (For required and optional fields, please refer to the [API](https://developers.zenodo.org/#representation) or carefully read the `Metadata` docstring. However, as `pydantic` is used as parent class, invalid or missing parameters will lead to errors):" ] }, { "cell_type": "code", "execution_count": 4, "id": "90d65507-8c0d-428b-8105-7e436461e1be", "metadata": {}, "outputs": [], "source": [ "from h5rdmtoolbox.repository.zenodo import metadata\n", "from datetime import datetime\n", "\n", "meta = metadata.Metadata(\n", " version=\"0.1.0-rc.1+build.1\",\n", " title='[deleteme]h5tbxZenodoInterface',\n", " description='A toolbox for managing HDF5-based research data management',\n", " creators=[metadata.Creator(name=\"Probst, Matthias\",\n", " affiliation=\"KIT - ITS\",\n", " orcid=\"0000-0001-8729-0482\")],\n", " contributors=[metadata.Contributor(name=\"Probst, Matthias\",\n", " affiliation=\"KIT - ITS\",\n", " orcid=\"0000-0001-8729-0482\",\n", " type=\"ContactPerson\")],\n", " upload_type='image',\n", " image_type='photo',\n", " access_right='open',\n", " keywords=['hdf5', 'research data management', 'rdm'],\n", " publication_date=datetime.now(),\n", " embargo_date='2020'\n", ")" ] }, { "cell_type": "markdown", "id": "3a5d6fef-b96b-45a0-8d0c-9157c5192bce", "metadata": {}, "source": [ "... finally make the changes effective by setting the metadata:" ] }, { "cell_type": "code", "execution_count": 5, "id": "e337c361-6d48-46cc-a3d7-292cadfc28fa", "metadata": {}, "outputs": [], "source": [ "repo.set_metadata(meta)" ] }, { "cell_type": "markdown", "id": "11efa913-f440-4431-9de7-1daa84fcbac8", "metadata": {}, "source": [ "### 3. Upload files\n", "\n", "*Any* file can be added (uploaded) by calling `upload_file(...)`. It can be a simple text, CSV or binary file. Often, it is advisable to describe the content in an additional file and hence provide more (machine-interpretable) information. Best is, to use JSON-LD files for this. The JSON-LD format allows describing file content and context in a standardized way.\n", "\n", "One of the parameters of `upload_file(...)` is `metamapper`. It expects a function, that extracts meta information from the input file. If the parameter `auto_map_hdf` is True and a HDF5 file is passed (scans for file suffixes `.hdf`, `.hdf5` and `.h5`), the built-in converter function will be called, which writes a JSON-LD file.\n", "\n", "By providing the `metamapper`-function, the target file and its metadata filename (which the function created) will be uploaded together.\n", "\n", "Adding a metadata file is especially beneficial for large, binary files. Like this, the metadata file can be downloaded and explored quickly by the user." ] }, { "cell_type": "code", "execution_count": 6, "id": "9bc60428-785f-41aa-a356-2114e0d183ac", "metadata": {}, "outputs": [], "source": [ "repo.upload_file(filename)" ] }, { "cell_type": "markdown", "id": "00482800-885f-482e-a136-10ee167c3a72", "metadata": {}, "source": [ "List the just uploaded files in the repository:" ] }, { "cell_type": "code", "execution_count": 7, "id": "500cb79c-76a4-4f3f-a84f-da36b1e63d5b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'tmp0.hdf': RepositoryFile(tmp0.hdf),\n", " 'tmp0.jsonld': RepositoryFile(tmp0.jsonld)}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "repo.files" ] }, { "cell_type": "markdown", "id": "b93926d6-37d8-4ba0-804a-b3892cccb8cd", "metadata": {}, "source": [ "### 3b Custom metamapper\n", "\n", "We could of course write and use our own metadata extract function like so:" ] }, { "cell_type": "code", "execution_count": 8, "id": "fbf04793-b7eb-4377-a904-edb91542b056", "metadata": {}, "outputs": [], "source": [ "import pathlib\n", "\n", "def my_meta_mapper(filename):\n", " \"\"\"very primitive...and not a jsonld file, but \n", " servese the demonstrating purpose.\"\"\"\n", " txt_filename = pathlib.Path(filename).with_suffix('.txt')\n", " with open(txt_filename, 'w') as f:\n", " f.write(f'filename: {filename}')\n", " return txt_filename" ] }, { "cell_type": "code", "execution_count": 9, "id": "02290359-b13b-413d-9b93-9706b3ab087d", "metadata": {}, "outputs": [], "source": [ "repo.upload_file(filename, metamapper=my_meta_mapper, overwrite=True)" ] }, { "cell_type": "markdown", "id": "312c68e0-0a90-43de-aefe-fe5091b3f21b", "metadata": {}, "source": [ "Proof, that it worked:" ] }, { "cell_type": "code", "execution_count": 11, "id": "4e75f4fc-6311-470c-9204-93c1c5d768d0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tmp0.hdf\n", "tmp0.jsonld\n", "tmp0.txt\n" ] } ], "source": [ "for file in repo.files:\n", " print(file)" ] }, { "cell_type": "code", "execution_count": null, "id": "8d8df017-3d3e-469f-84da-eb848c2592dc", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.19" } }, "nbformat": 4, "nbformat_minor": 5 }