{ "cells": [ { "cell_type": "markdown", "id": "9a1217bc-48a1-4a5d-9996-8f762565383f", "metadata": {}, "source": [ "# Special I/O\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "77fa362b-5622-4b69-b37a-04ed4f59ec48", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "using(\"h5py\")" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import h5rdmtoolbox as h5tbx\n", "h5tbx.use(None)" ] }, { "cell_type": "markdown", "id": "10def77e-2f73-4466-91e8-bf5c211dba60", "metadata": {}, "source": [ "## Creating datasets and CSV file(s)\n", "Datasets can be created directly form a single or from multiple files. Let's first create two simple CSV files:" ] }, { "cell_type": "code", "execution_count": 2, "id": "8b62a6ee-6e07-4907-b7cc-cc23d021b4e6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xy
00.5434050.004719
10.2783690.121569
20.4245180.670749
30.8447760.825853
\n", "
" ], "text/plain": [ " x y\n", "0 0.543405 0.004719\n", "1 0.278369 0.121569\n", "2 0.424518 0.670749\n", "3 0.844776 0.825853" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "np.random.seed(100)\n", "\n", "# first\n", "df = pd.DataFrame({'x': np.random.random((4, )),\n", " 'y': np.random.random((4, ))})\n", "csv_filename1 = h5tbx.utils.generate_temporary_filename(suffix='.csv')\n", "df.to_csv(csv_filename1, index=None)\n", "df" ] }, { "cell_type": "code", "execution_count": 3, "id": "180c998d-28b0-4759-8924-d91d7073c38a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xy
00.1367070.811683
10.5750930.171941
20.8913220.816225
30.2092020.274074
40.1853280.431704
50.1083770.940030
60.2196970.817649
70.9786240.336112
\n", "
" ], "text/plain": [ " x y\n", "0 0.136707 0.811683\n", "1 0.575093 0.171941\n", "2 0.891322 0.816225\n", "3 0.209202 0.274074\n", "4 0.185328 0.431704\n", "5 0.108377 0.940030\n", "6 0.219697 0.817649\n", "7 0.978624 0.336112" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# second\n", "df = pd.DataFrame({'x': np.random.random((8, )),\n", " 'y': np.random.random((8, ))})\n", "csv_filename2 = h5tbx.utils.generate_temporary_filename(suffix='.csv')\n", "df.to_csv(csv_filename2, index=None)\n", "df" ] }, { "cell_type": "markdown", "id": "c27151bc-5d91-48f5-b10f-cbbaaf169280", "metadata": {}, "source": [ "Create from a single file:" ] }, { "cell_type": "code", "execution_count": 4, "id": "525c2150-6fb7-48b0-9e78-4fdcad50578a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n", " \n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "with h5tbx.File() as h5:\n", " h5.create_dataset_from_csv(csv_filename=csv_filename1)\n", " h5.dump()" ] }, { "cell_type": "markdown", "id": "b165e857-f03d-4f22-9150-999e79dec640", "metadata": {}, "source": [ "For creating from multiple CSV files, it must be decided whether to stack (datasets must have same size) or concatenate them:" ] }, { "cell_type": "markdown", "id": "629a6069-e850-47ad-aec8-fd41b853dfd8", "metadata": {}, "source": [ "... concatenating:" ] }, { "cell_type": "code", "execution_count": 5, "id": "d8465a39-d70e-4bad-b265-0b5453c027f8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n", " \n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "with h5tbx.File() as h5:\n", " h5.create_datasets_from_csv(csv_filenames=[csv_filename1, csv_filename2], combine_opt='concatenate')\n", " h5.dump()" ] }, { "cell_type": "markdown", "id": "e48d1d10-db87-4a40-8853-fed42d6cca2f", "metadata": {}, "source": [ "... stacking:" ] }, { "cell_type": "code", "execution_count": 6, "id": "ff0942b5-dc0e-45b7-927e-72a7f62a4b41", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n", "
    \n", "
  • \n", " \n", " \n", " \n", "
      \n", "
    \n", "\n", "
      \n", " \n", " \n", " (2, 8) [float64]\n", "
        \n", "
      \n", "
    \n", "\n", "
      \n", " \n", " \n", " (2, 8) [float64]\n", "
        \n", "
      \n", "
    \n", "
  • \n", "
\n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "with h5tbx.File() as h5:\n", " h5.create_datasets_from_csv(csv_filenames=[csv_filename2, csv_filename2], combine_opt='stack')\n", " h5.dump()" ] }, { "cell_type": "markdown", "id": "fdcf6cdb-3b5b-499a-8b89-ecf3d6fdc58f", "metadata": {}, "source": [ "## Creating datasets and image file(s)\n", "A dataset can be created from image data. The data can be provided as a list of numpy arrays:" ] }, { "cell_type": "code", "execution_count": 7, "id": "de901e0e-fa8e-45a1-a992-bd61be3c29b4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n", "
    \n", "
  • \n", " \n", " \n", " \n", "
      \n", "
    \n", "\n", "
      \n", " \n", " \n", " (5, 20, 10) [float32]\n", "
        \n", "
      \n", "
    \n", "
  • \n", "
\n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "with h5tbx.File() as h5:\n", " h5.create_dataset_from_image([np.random.random((20, 10))] * 5,\n", " 'testimg', axis=0)\n", " h5.dump()" ] }, { "cell_type": "markdown", "id": "6e362049-7424-46f6-abb9-261f969a43ef", "metadata": {}, "source": [ "... or as a iterable object which provides the image data one at a time:" ] }, { "cell_type": "code", "execution_count": 8, "id": "5db487b5-e2a7-4bdb-a05e-4a5205bc9220", "metadata": {}, "outputs": [], "source": [ "class ImgReader:\n", " \"\"\"Dummy Image Reader\"\"\"\n", " def __init__(self, imgdir):\n", " self._imgdir = imgdir\n", " self._index = 0\n", " self._size = 5\n", "\n", " def read_img(self):\n", " # provide random image. Use case would read from file...\n", " return np.random.random((20, 10))\n", "\n", " def __iter__(self):\n", " return self\n", "\n", " def __len__(self):\n", " return self._size\n", "\n", " def __next__(self):\n", " if self._index < self._size:\n", " self._index += 1\n", " return self.read_img()\n", " raise StopIteration" ] }, { "cell_type": "code", "execution_count": 9, "id": "8f39eea2-5000-47aa-89ae-4bf3ad35b834", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n", "
    \n", "
  • \n", " \n", " \n", " \n", "
      \n", "
    \n", "\n", "
      \n", " \n", " \n", " (5, 20, 10) [float32]\n", "
        \n", "
      \n", "
    \n", "
  • \n", "
\n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "imgreader = ImgReader('testdir')\n", "with h5tbx.File() as h5:\n", " h5.create_dataset_from_image(imgreader, 'testimg', axis=0)\n", " h5.dump()" ] }, { "cell_type": "code", "execution_count": null, "id": "72f3d9a5-1f8c-4e29-b7e0-b68801311fb1", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.19" } }, "nbformat": 4, "nbformat_minor": 5 }