{
"cells": [
{
"cell_type": "markdown",
"id": "9a1217bc-48a1-4a5d-9996-8f762565383f",
"metadata": {},
"source": [
"# Special I/O\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "77fa362b-5622-4b69-b37a-04ed4f59ec48",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"using(\"h5py\")"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import h5rdmtoolbox as h5tbx\n",
"h5tbx.use(None)"
]
},
{
"cell_type": "markdown",
"id": "10def77e-2f73-4466-91e8-bf5c211dba60",
"metadata": {},
"source": [
"## Creating datasets and CSV file(s)\n",
"Datasets can be created directly form a single or from multiple files. Let's first create two simple CSV files:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "8b62a6ee-6e07-4907-b7cc-cc23d021b4e6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" x | \n",
" y | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0.543405 | \n",
" 0.004719 | \n",
"
\n",
" \n",
" | 1 | \n",
" 0.278369 | \n",
" 0.121569 | \n",
"
\n",
" \n",
" | 2 | \n",
" 0.424518 | \n",
" 0.670749 | \n",
"
\n",
" \n",
" | 3 | \n",
" 0.844776 | \n",
" 0.825853 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" x y\n",
"0 0.543405 0.004719\n",
"1 0.278369 0.121569\n",
"2 0.424518 0.670749\n",
"3 0.844776 0.825853"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"np.random.seed(100)\n",
"\n",
"# first\n",
"df = pd.DataFrame({'x': np.random.random((4, )),\n",
" 'y': np.random.random((4, ))})\n",
"csv_filename1 = h5tbx.utils.generate_temporary_filename(suffix='.csv')\n",
"df.to_csv(csv_filename1, index=None)\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "180c998d-28b0-4759-8924-d91d7073c38a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" x | \n",
" y | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0.136707 | \n",
" 0.811683 | \n",
"
\n",
" \n",
" | 1 | \n",
" 0.575093 | \n",
" 0.171941 | \n",
"
\n",
" \n",
" | 2 | \n",
" 0.891322 | \n",
" 0.816225 | \n",
"
\n",
" \n",
" | 3 | \n",
" 0.209202 | \n",
" 0.274074 | \n",
"
\n",
" \n",
" | 4 | \n",
" 0.185328 | \n",
" 0.431704 | \n",
"
\n",
" \n",
" | 5 | \n",
" 0.108377 | \n",
" 0.940030 | \n",
"
\n",
" \n",
" | 6 | \n",
" 0.219697 | \n",
" 0.817649 | \n",
"
\n",
" \n",
" | 7 | \n",
" 0.978624 | \n",
" 0.336112 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" x y\n",
"0 0.136707 0.811683\n",
"1 0.575093 0.171941\n",
"2 0.891322 0.816225\n",
"3 0.209202 0.274074\n",
"4 0.185328 0.431704\n",
"5 0.108377 0.940030\n",
"6 0.219697 0.817649\n",
"7 0.978624 0.336112"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# second\n",
"df = pd.DataFrame({'x': np.random.random((8, )),\n",
" 'y': np.random.random((8, ))})\n",
"csv_filename2 = h5tbx.utils.generate_temporary_filename(suffix='.csv')\n",
"df.to_csv(csv_filename2, index=None)\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "c27151bc-5d91-48f5-b10f-cbbaaf169280",
"metadata": {},
"source": [
"Create from a single file:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "525c2150-6fb7-48b0-9e78-4fdcad50578a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"
\n",
" - \n",
" \n",
" \n",
" \n",
"\n",
"\n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"with h5tbx.File() as h5:\n",
" h5.create_dataset_from_csv(csv_filename=csv_filename1)\n",
" h5.dump()"
]
},
{
"cell_type": "markdown",
"id": "b165e857-f03d-4f22-9150-999e79dec640",
"metadata": {},
"source": [
"For creating from multiple CSV files, it must be decided whether to stack (datasets must have same size) or concatenate them:"
]
},
{
"cell_type": "markdown",
"id": "629a6069-e850-47ad-aec8-fd41b853dfd8",
"metadata": {},
"source": [
"... concatenating:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d8465a39-d70e-4bad-b265-0b5453c027f8",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"
\n",
" - \n",
" \n",
" \n",
" \n",
"\n",
"\n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"with h5tbx.File() as h5:\n",
" h5.create_datasets_from_csv(csv_filenames=[csv_filename1, csv_filename2], combine_opt='concatenate')\n",
" h5.dump()"
]
},
{
"cell_type": "markdown",
"id": "e48d1d10-db87-4a40-8853-fed42d6cca2f",
"metadata": {},
"source": [
"... stacking:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "ff0942b5-dc0e-45b7-927e-72a7f62a4b41",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"
\n",
" - \n",
" \n",
" \n",
" \n",
"\n",
"\n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"with h5tbx.File() as h5:\n",
" h5.create_datasets_from_csv(csv_filenames=[csv_filename2, csv_filename2], combine_opt='stack')\n",
" h5.dump()"
]
},
{
"cell_type": "markdown",
"id": "fdcf6cdb-3b5b-499a-8b89-ecf3d6fdc58f",
"metadata": {},
"source": [
"## Creating datasets and image file(s)\n",
"A dataset can be created from image data. The data can be provided as a list of numpy arrays:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "de901e0e-fa8e-45a1-a992-bd61be3c29b4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"
\n",
" - \n",
" \n",
" \n",
" \n",
"\n",
"\n",
" \n",
"
\n",
"
\n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"with h5tbx.File() as h5:\n",
" h5.create_dataset_from_image([np.random.random((20, 10))] * 5,\n",
" 'testimg', axis=0)\n",
" h5.dump()"
]
},
{
"cell_type": "markdown",
"id": "6e362049-7424-46f6-abb9-261f969a43ef",
"metadata": {},
"source": [
"... or as a iterable object which provides the image data one at a time:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "5db487b5-e2a7-4bdb-a05e-4a5205bc9220",
"metadata": {},
"outputs": [],
"source": [
"class ImgReader:\n",
" \"\"\"Dummy Image Reader\"\"\"\n",
" def __init__(self, imgdir):\n",
" self._imgdir = imgdir\n",
" self._index = 0\n",
" self._size = 5\n",
"\n",
" def read_img(self):\n",
" # provide random image. Use case would read from file...\n",
" return np.random.random((20, 10))\n",
"\n",
" def __iter__(self):\n",
" return self\n",
"\n",
" def __len__(self):\n",
" return self._size\n",
"\n",
" def __next__(self):\n",
" if self._index < self._size:\n",
" self._index += 1\n",
" return self.read_img()\n",
" raise StopIteration"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "8f39eea2-5000-47aa-89ae-4bf3ad35b834",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"
\n",
" - \n",
" \n",
" \n",
" \n",
"\n",
"\n",
" \n",
"
\n",
"
\n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"imgreader = ImgReader('testdir')\n",
"with h5tbx.File() as h5:\n",
" h5.create_dataset_from_image(imgreader, 'testimg', axis=0)\n",
" h5.dump()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "72f3d9a5-1f8c-4e29-b7e0-b68801311fb1",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.19"
}
},
"nbformat": 4,
"nbformat_minor": 5
}