{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7e0ce115-a914-4e1d-ac0b-5e8f43dd9409",
   "metadata": {},
   "source": [
    "# EngMeta\n",
    "\n",
    "[EngMeta](https://doi.org/10.18419/darus-500) is a metadata Schema developed as part of the NFDI4Ing (Nationale Forschungsdaten Infrastruktur für Ingenieure).\n",
    "\n",
    "The schema is written a .xsd-file (XML Schema file), however, the toolbox does not provide a converter into YAML file in order to use the schema as a convention. In this example, most mandatory and required fields of the EngMeta schema are manually translated into a convention-YAML file. The below table gives an overview of fields. **Note,** that not all fields are put into the YAML file as numerous attributes like the file size, file type etc. can be derived from the file itself and don't need to be provided by the user.\n",
    "\n",
    "The fields are:\n",
    "\n",
    "| Category   | Title   |      Standard Attribute Name      |  Data Type | Obligation | \n",
    "|----------|----------|:-------------:|------:|------:|\n",
    "| **Descriptive Metadata** | Contact Person |  contact | `$personOrOrganization` | M |\n",
    "| | Producer/Author |  creator | `$personOrOrganization` |M |\n",
    "| | Contributor |  contributor | `$personOrOrganization` | O |\n",
    "| | Title |  title | `$str` |M |\n",
    "| | Description |  description | `$str` | O |\n",
    "| | Keywords |  keywords | `$str` |R |\n",
    "| | Subject |  subject | `$str` |R |\n",
    "| | Dates (Creation, Publication, ...) |  dates | `$str` |R |\n",
    "| **Process Metadata** | Provenance information | provenance | `$processingStep` |O |\n",
    "| **Technical Metadata**| PID | identifier | `$pid` | M |\n",
    "|  | Legal Information | rightsStatement | `$rightsStatement` | R |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "181e1cbe-4cce-4bb6-a1f1-d8f9dcce89dc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Failed to import module h5tbx\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Convention(\"EngMeta\")"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import h5rdmtoolbox as h5tbx\n",
    "cv = h5tbx.convention.from_yaml('EngMeta.yaml')\n",
    "cv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ec38972e-3682-43c1-9c7d-34014e70d09a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "using(\"EngMeta\")"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "h5tbx.use(cv)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "e1c66f76-022f-4af0-8022-a03c7d53df0c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'contributor': FieldInfo(annotation=personOrOrganization, required=True)}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "contact = cv.registered_standard_attributes['contact']\n",
    "contact.validator.model_fields"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "1ff37233-79f8-4da2-92d3-a705281520ba",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "contact.validate(\n",
    "    {'name': 'Matthias Probst',\n",
    "     'id': 'https://orcid.org/0000-0001-8729-0482',\n",
    "     'role': 'Researcher'}\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "2394d416-3048-4768-9e90-a956f48c446e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "contact.validate(\n",
    "    {'name': 'Matthias Probst',\n",
    "     'role': 'Invalid Role'}\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "94d4146a-726c-4c46-aca6-7a771748c864",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "contact.validate({'name': 'Matthias Probst'})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "d6837d44-1064-4d0b-aa37-09df20f5bbb8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<head><style>/*\r\n",
       "CSS inspired by xarray: https://github.com/pydata/xarray\r\n",
       "*/\r\n",
       ".h5tb-header > div,\r\n",
       ".h5tb-header > ul {\r\n",
       "    display: inline;\r\n",
       "    margin-top: 0;\r\n",
       "    margin-bottom: 0;\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-dataarray-cls,\r\n",
       ".h5tb-dataarray-name {\r\n",
       "    margin-left: 2px;\r\n",
       "    margin-right: 10px;\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-dataarray-name {\r\n",
       "    color: #000;\r\n",
       "}\r\n",
       "\r\n",
       ".h5grp-sections {\r\n",
       "    list-style: none;\r\n",
       "    padding: 3px;\r\n",
       "    margin: 0;\r\n",
       "}\r\n",
       "\r\n",
       ".h5grp-sections input {\r\n",
       "    display: none;\r\n",
       "}\r\n",
       "\r\n",
       ".h5grp-sections label {\r\n",
       "    display: inline;\r\n",
       "}\r\n",
       "\r\n",
       ".h5grp-sections > li > input + label > span {\r\n",
       "    display: inline;\r\n",
       "    margin-left: 4px;\r\n",
       "}\r\n",
       "\r\n",
       ".h5grp-sections > li > input:checked + label > span {\r\n",
       "    display: none;\r\n",
       "}\r\n",
       "\r\n",
       ".h5grp-sections input:enabled + label {\r\n",
       "    cursor: pointer;\r\n",
       "}\r\n",
       "\r\n",
       ".h5grp-sections input:not(.h5tb-values-in) ~ ul {\r\n",
       "    display: none;\r\n",
       "}\r\n",
       "\r\n",
       ".h5grp-sections input:not(.h5tb-values-in):checked ~ ul {\r\n",
       "    display: block;\r\n",
       "}\r\n",
       "\r\n",
       ".h5grp-sections > li > input + label {\r\n",
       "    width: 140px;\r\n",
       "    color: #555;\r\n",
       "    font-weight: 500;\r\n",
       "    padding: 4px 0 2px 0;\r\n",
       "}\r\n",
       "\r\n",
       "\r\n",
       ".h5grp-sections > li > input + label:before {\r\n",
       "    display: inline-block;\r\n",
       "    content: '+';\r\n",
       "    font-size: 11px;\r\n",
       "    width: 15px;\r\n",
       "    text-align: center;\r\n",
       "}\r\n",
       "\r\n",
       ".h5grp-sections > li > input:disabled + label:before {\r\n",
       "    color: #777;\r\n",
       "}\r\n",
       "\r\n",
       ".h5grp-sections > li > input:checked + label:before {\r\n",
       "    content: '-';\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-dim-list {\r\n",
       "    display: inline-block !important;\r\n",
       "    list-style: none;\r\n",
       "    padding: 0;\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-dim-list li {\r\n",
       "    display: inline-block;\r\n",
       "    padding: 0;\r\n",
       "    margin: 0;\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-dim-list:before {\r\n",
       "    content: '(';\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-dim-list:after {\r\n",
       "    content: ')';\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-dim-list li:not(:last-child):after {\r\n",
       "    content: ',';\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-has-index {\r\n",
       "    text-decoration: underline;\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-var-list {\r\n",
       "    list-style: none;\r\n",
       "    padding: 0;\r\n",
       "    margin: 0;\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-var-list > li {\r\n",
       "    background-color: #fcfcfc;\r\n",
       "    overflow: hidden;\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-var-list > li:nth-child(odd) {\r\n",
       "    background-color: #efefef;\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-var-list li:hover {\r\n",
       "    background-color: rgba(3, 169, 244, .2);\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-var-list li > span {\r\n",
       "    display: inline-block;\r\n",
       "}\r\n",
       "\r\n",
       "input.h5tb-varname-in + label {\r\n",
       "    width: 140px;\r\n",
       "    padding-left: 0;\r\n",
       "    font-weight: bold;\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-dataset {\r\n",
       "    width: 100px;\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-attributevalue {\r\n",
       "    width: 100px;\r\n",
       "    text-align: left;\r\n",
       "    color: #888;\r\n",
       "    white-space: nowrap;\r\n",
       "    font-size: 12px;\r\n",
       "}\r\n",
       "\r\n",
       "input.h5tb-varname-in + label:before {\r\n",
       "    content: ' ';\r\n",
       "    display: inline-block;\r\n",
       "    font-size: 11px;\r\n",
       "    width: 15px;\r\n",
       "    padding-left: 2px;\r\n",
       "    padding-right: 2px;\r\n",
       "    text-align: center;\r\n",
       "    color: #aaa;\r\n",
       "    text-decoration: none !important;\r\n",
       "}\r\n",
       "\r\n",
       "input.h5tb-varname-in:enabled + label:hover:before {\r\n",
       "    color: #000;\r\n",
       "}\r\n",
       "\r\n",
       "input.h5tb-varname-in:checked + label:before {\r\n",
       "    color: #ccc;\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-dims {\r\n",
       "    width: 280px;\r\n",
       "    white-space: nowrap;\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-attr-list {\r\n",
       "    list-style: none;\r\n",
       "    background-color: rgba(0,0,0,.0);\r\n",
       "    padding-bottom: 6px;\r\n",
       "    color: #6a6c73;\r\n",
       "}\r\n",
       "\r\n",
       ".h5tb-attr-list li,\r\n",
       ".h5tb-attr-list li:hover {\r\n",
       "    background-color: rgba(0,0,0,.0);\r\n",
       "}\r\n",
       "\r\n",
       "\r\n",
       "/* Tooltip container */\r\n",
       ".tooltip {\r\n",
       "position: relative;\r\n",
       "display: inline-block;\r\n",
       "cursor: pointer;\r\n",
       "}\r\n",
       "\r\n",
       "/* Tooltip text */\r\n",
       ".tooltip .tooltiptext {\r\n",
       "visibility: hidden;\r\n",
       "width: 300px;\r\n",
       "background-color: #555;\r\n",
       "color: #fff;\r\n",
       "text-align: center;\r\n",
       "border-radius: 6px;\r\n",
       "padding: 5px;\r\n",
       "position: absolute;\r\n",
       "z-index: 1;\r\n",
       "bottom: 125%;\r\n",
       "left: 50%;\r\n",
       "margin-left: -60px;\r\n",
       "opacity: 0;\r\n",
       "transition: opacity 0.3s;\r\n",
       "}\r\n",
       "\r\n",
       "/* Show the tooltip on hover */\r\n",
       ".tooltip:hover .tooltiptext {\r\n",
       "visibility: visible;\r\n",
       "opacity:  1;\r\n",
       "}</style></head>\n",
       "<div class='h5tb-warp'>\n",
       "\n",
       "              <ul style=\"list-style-type: none;\" class=\"h5grp-sections\">\n",
       "                    <li>\n",
       "                        <input id=\"group-ds--5395760600\" type=\"checkbox\" checked>\n",
       "                        <label style=\"font-weight: bold\" for=\"group-ds--5395760600\">\n",
       "                        /<span>(0)</span></label>\n",
       "                  \n",
       "<ul class=\"h5tb-attr-list\"><li style=\"list-style-type: none; font-style: italic\">contact : {\"name\": \"Matthias Probst\"}</li><li style=\"list-style-type: none; font-style: italic\">creator : {\"name\": \"Matthias Probst\", \"id\": \"https://orcid.org/0000-0001-8729-0482\", \"role\": \"Researcher\"}</li><li style=\"list-style-type: none; font-style: italic\">pid : {\"id\": \"123\", \"type\": \"other\"}</li><li style=\"list-style-type: none; font-style: italic\">title : Test file to demonstrate usage of EngMeta schema</li>\n",
       "                    </ul>\n",
       "</li>\n",
       "</ul>\n",
       "</div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "with h5tbx.File(contact=dict(name='Matthias Probst'),\n",
    "                creator=dict(name='Matthias Probst',\n",
    "                             id='https://orcid.org/0000-0001-8729-0482',\n",
    "                             role='Researcher'\n",
    "                             ),\n",
    "                pid=dict(id='123', type='other'),\n",
    "                title='Test file to demonstrate usage of EngMeta schema') as h5:\n",
    "    fname = h5.hdf_filename\n",
    "    h5.dump()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd9c6033-c53d-4947-92b0-1b2d513eea30",
   "metadata": {},
   "source": [
    "## Mapping functions\n",
    "\n",
    "Some metadata can be extracted automatically from the file, like the `file size`, `file type` and `check sum` for example. Such functions are needed, if metadata like this is required:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "206c6ca0-35ac-4221-b626-d27e3abe4a02",
   "metadata": {},
   "outputs": [],
   "source": [
    "import hashlib\n",
    "def extract_metadata(filename):\n",
    "    with h5tbx.File(filename) as h5:\n",
    "        fsize = h5.filesize\n",
    "    \n",
    "    return dict(file_size=fsize, file_type='hdf5', checksum=hashlib.md5(open(fname, 'rb').read()).hexdigest())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "6258751b-b488-45fd-a0b1-1f8dacca3b64",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'file_size': array(6944) <Unit('byte')>,\n",
       " 'file_type': 'hdf5',\n",
       " 'checksum': '4fc13072171dbb2a68a2cf4249f38565'}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "extract_metadata(fname)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3452fe5a-707a-41b1-b937-fb93036eec54",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}