h5py

An HDF5 file is a container for two kinds of objects: datasets, which are array-like collections of data, and groups, which are folder-like containers that hold datasets and other groups. The most fundamental thing to remember when using h5py is:

Groups work like dictionaries, and datasets work like NumPy arrays

Read .hdf5/.h5 files

1
2
>>> import h5py
>>> f = h5py.File('mytestfile.hdf5', 'r')

The File object is your starting point. What is stored in this file? Remember h5py.File acts like a Python dictionary, thus we can check the keys,

1
2
>>> list(f.keys())
['mydataset']

The object we obtained isn’t an array, but an HDF5 dataset. Like NumPy arrays, datasets have both a shape and a data type:

1
2
3
4
5
6
7
8
9
10
11
12
>>> dset.shape
(100,)
>>> dset.dtype
dtype('int32')
# They also support array-style slicing. This is how you read and write data from a dataset in the file:
>>> dset[...] = np.arange(100)
>>> dset[0]
0
>>> dset[10]
10
>>> dset[0:100:10]
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

Write .hdf5/.h5 files

At this point, you may wonder how mytestdata.hdf5 is created. We can create a file by setting the mode to w when the File object is initialized. Some other modes are a (for read/write/create access), and r+ (for read/write access).

1
2
3
>>> import h5py
>>> import numpy as np
>>> f = h5py.File("mytestfile.hdf5", "w")

The File object has a couple of methods which look interesting. One of them is create_dataset, which as the name suggests, creates a data set of given shape and dtype

1
>>> dset = f.create_dataset("mydataset", (100,), dtype='i')

The File object is a context manager; so the following code works too

1
2
3
4
>>> import h5py
>>> import numpy as np
>>> with h5py.File("mytestfile.hdf5", "w") as f:
>>> dset = f.create_dataset("mydataset", (100,), dtype='i')

Reading example (with h5py==2.9)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import h5py
from h5py import Dataset, Group, File
import numpy as np
f = h5py.File('PUGAN_poisson_256_poisson_1024.h5', 'r')
print(f.keys())
# <KeysViewHDF5 ['poisson_1024', 'poisson_256']>
p256=f['poisson_256']
p1024=f['poisson_1024']
print(p256.shape,p1024.shape)
print(p256.value)
# (24000, 256, 3) (24000, 1024, 3)
'''
[[[ -0.07107 1.54135 -1.327173]
[ -0.512117 -0.410398 0.134543]
[ 0.137435 0.697742 0.286036]
...
[ -5.215001 7.463724 1.873999]
[ -7.951387 1.839743 1.807378]
[ -0.972066 -13.259521 -0.414962]]]
'''
# restore some data into csv file
import pandas as pd
pd.DataFrame(p256[0]).to_csv('sample256.csv',index=False, sep=' ')
pd.DataFrame(p1024[0]).to_csv('sample1024.csv',index=False, sep=' ')

Writing example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import os
from plyfile import PlyData, PlyElement
import numpy as np
import pandas as pd
import h5py
from h5py import Dataset, Group, File
def random_sampling(pointcloud,K=2048):
N = pointcloud.shape[0]
if N >= K:
idx = np.random.choice(N, K)
return pointcloud[idx]
else:
idx = np.random.choice(np.arange(N), K-N)
expand_data = pointcloud[idx]
pointcloud = np.concatenate((pointcloud, expand_data), 0)
return pointcloud

def read_ply(file_dir):
#文件的路径
plydata = PlyData.read(file_dir) # 读取文件
data = plydata.elements[0].data # 读取数据
data_pd = pd.DataFrame(data) # 转换成DataFrame, 因为DataFrame可以解析结构化的数据
data_np = np.zeros(data_pd.shape, dtype=np.float) # 初始化储存数据的array
property_names = data[0].dtype.names # 读取property的名字
for i, name in enumerate(property_names): # 按property读取数据,这样可以保证读出的数据是同样的数据类型。
data_np[:, i] = data_pd[name]
if data_np.shape[1]>3:
data_np=data_np[:,:3]
print("PLY FILE ",file_dir," result shape: ", data_np.shape)
return data_np

d0=read_ply('longdress_vox10_1051.ply')
d0_2k=random_sampling(d0)
print(d0_2k.shape)
# PLY FILE longdress_vox10_1051.ply result shape: (765821, 3)
# (2048, 3)
d=np.expand_dims(d0_2k,axis=0)
d=np.concatenate([d,d],axis=0)
print(d)
'''
[[[261. 51. 110.]
[256. 544. 226.]
[266. 35. 122.]
...
[270. 143. 83.]
[252. 285. 147.]
[197. 521. 47.]]

[[261. 51. 110.]
[256. 544. 226.]
[266. 35. 122.]
...
[270. 143. 83.]
[252. 285. 147.]
[197. 521. 47.]]]
'''

f=h5py.File("myh5py.hdf5","w")
d1=f.create_dataset("dset1",data=d)
d2=f.create_dataset("dset2",data=d)
f1 = h5py.File('myh5py.hdf5', 'r')

for key in f1.keys():
print(key)
f1_d=f1['dset1']
f2_d=f1['dset2']
print(f1_d.value)
print(f2_d.shape)
'''
dset1
dset2
[[[321. 685. 184.]
[191. 742. 62.]
[240. 307. 199.]
...
[299. 603. 195.]
[272. 789. 173.]
[146. 824. 155.]]

[[321. 685. 184.]
[191. 742. 62.]
[240. 307. 199.]
...
[299. 603. 195.]
[272. 789. 173.]
[146. 824. 155.]]]
(2, 2048, 3)
'''


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!