An HDF5 file is a container for two kinds of objects: datasets, which are array-like collections of data, and groups, which are folder-like containers that hold datasets and other groups. The most fundamental thing to remember when using h5py is:
Groups work like dictionaries, and datasets work like NumPy arrays
Read .hdf5/.h5 files
1 2
>>> import h5py >>> f = h5py.File('mytestfile.hdf5', 'r')
The File object is your starting point. What is stored in this file? Remember h5py.File acts like a Python dictionary, thus we can check the keys,
1 2
>>> list(f.keys()) ['mydataset']
The object we obtained isn’t an array, but an HDF5 dataset. Like NumPy arrays, datasets have both a shape and a data type:
1 2 3 4 5 6 7 8 9 10 11 12
>>> dset.shape (100,) >>> dset.dtype dtype('int32') # They also support array-style slicing. This is how you read and write data from a dataset in the file: >>> dset[...] = np.arange(100) >>> dset[0] 0 >>> dset[10] 10 >>> dset[0:100:10] array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
Write .hdf5/.h5 files
At this point, you may wonder how mytestdata.hdf5 is created. We can create a file by setting the mode to w when the File object is initialized. Some other modes are a (for read/write/create access), and r+ (for read/write access).
1 2 3
>>> import h5py >>> import numpy as np >>> f = h5py.File("mytestfile.hdf5", "w")
The File object has a couple of methods which look interesting. One of them is create_dataset, which as the name suggests, creates a data set of given shape and dtype