h5py - Hanx

An HDF5 file is a container for two kinds of objects: datasets, which are array-like collections of data, and groups, which are folder-like containers that hold datasets and other groups. The most fundamental thing to remember when using h5py is:

Groups work like dictionaries, and datasets work like NumPy arrays

Read .hdf5/.h5 files

1 2	`>>> import h5py >>> f = h5py.File('mytestfile.hdf5', 'r')`

The File object is your starting point. What is stored in this file? Remember h5py.File acts like a Python dictionary, thus we can check the keys,

1 2	`>>> list(f.keys()) ['mydataset']`

The object we obtained isn’t an array, but an HDF5 dataset. Like NumPy arrays, datasets have both a shape and a data type:

>>> dset.shape
(100,)
>>> dset.dtype
dtype('int32')
# They also support array-style slicing. This is how you read and write data from a dataset in the file:
>>> dset[...] = np.arange(100)
>>> dset[0]
0
>>> dset[10]
10
>>> dset[0:100:10]
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

Write .hdf5/.h5 files

At this point, you may wonder how mytestdata.hdf5 is created. We can create a file by setting the mode to w when the File object is initialized. Some other modes are a (for read/write/create access), and r+ (for read/write access).

1
2
3

>>> import h5py
>>> import numpy as np
>>> f = h5py.File("mytestfile.hdf5", "w")

The File object has a couple of methods which look interesting. One of them is create_dataset, which as the name suggests, creates a data set of given shape and dtype

1	`>>> dset = f.create_dataset("mydataset", (100,), dtype='i')`

The File object is a context manager; so the following code works too

>>> import h5py
>>> import numpy as np
>>> with h5py.File("mytestfile.hdf5", "w") as f:
>>>     dset = f.create_dataset("mydataset", (100,), dtype='i')

Reading example (with h5py==2.9)

import h5py
from h5py import Dataset, Group, File
import numpy as np
f = h5py.File('PUGAN_poisson_256_poisson_1024.h5', 'r')
print(f.keys())
# <KeysViewHDF5 ['poisson_1024', 'poisson_256']>
p256=f['poisson_256']
p1024=f['poisson_1024']
print(p256.shape,p1024.shape)
print(p256.value)
# (24000, 256, 3) (24000, 1024, 3)
'''
[[[ -0.07107    1.54135   -1.327173]
  [ -0.512117  -0.410398   0.134543]
  [  0.137435   0.697742   0.286036]
  ...
  [ -5.215001   7.463724   1.873999]
  [ -7.951387   1.839743   1.807378]
  [ -0.972066 -13.259521  -0.414962]]]
'''
# restore some data into csv file
import pandas as pd
pd.DataFrame(p256[0]).to_csv('sample256.csv',index=False, sep=' ')
pd.DataFrame(p1024[0]).to_csv('sample1024.csv',index=False, sep=' ')

Writing example

import os
from plyfile import PlyData, PlyElement
import numpy as np
import pandas as pd
import h5py
from h5py import Dataset, Group, File
def random_sampling(pointcloud,K=2048):
    N = pointcloud.shape[0]
    if N >= K:
        idx = np.random.choice(N, K)
        return pointcloud[idx]
    else:
        idx = np.random.choice(np.arange(N), K-N)
        expand_data = pointcloud[idx]
        pointcloud = np.concatenate((pointcloud, expand_data), 0)
        return pointcloud

def read_ply(file_dir):
      #文件的路径
    plydata = PlyData.read(file_dir)  # 读取文件
    data = plydata.elements[0].data  # 读取数据
    data_pd = pd.DataFrame(data)  # 转换成DataFrame, 因为DataFrame可以解析结构化的数据
    data_np = np.zeros(data_pd.shape, dtype=np.float)  # 初始化储存数据的array
    property_names = data[0].dtype.names  # 读取property的名字
    for i, name in enumerate(property_names):  # 按property读取数据，这样可以保证读出的数据是同样的数据类型。
        data_np[:, i] = data_pd[name]
    if data_np.shape[1]>3:
        data_np=data_np[:,:3]
    print("PLY FILE ",file_dir," result shape: ", data_np.shape)
    return data_np

d0=read_ply('longdress_vox10_1051.ply')
d0_2k=random_sampling(d0)
print(d0_2k.shape)
# PLY FILE  longdress_vox10_1051.ply  result shape:  (765821, 3)
# (2048, 3)
d=np.expand_dims(d0_2k,axis=0)
d=np.concatenate([d,d],axis=0)
print(d)
'''
[[[261.  51. 110.]
  [256. 544. 226.]
  [266.  35. 122.]
  ...
  [270. 143.  83.]
  [252. 285. 147.]
  [197. 521.  47.]]

 [[261.  51. 110.]
  [256. 544. 226.]
  [266.  35. 122.]
  ...
  [270. 143.  83.]
  [252. 285. 147.]
  [197. 521.  47.]]]
'''

f=h5py.File("myh5py.hdf5","w")
d1=f.create_dataset("dset1",data=d)
d2=f.create_dataset("dset2",data=d)
f1 = h5py.File('myh5py.hdf5', 'r')

for key in f1.keys():
    print(key)
f1_d=f1['dset1']
f2_d=f1['dset2']
print(f1_d.value)
print(f2_d.shape)
'''
dset1
dset2
[[[321. 685. 184.]
  [191. 742.  62.]
  [240. 307. 199.]
  ...
  [299. 603. 195.]
  [272. 789. 173.]
  [146. 824. 155.]]

 [[321. 685. 184.]
  [191. 742.  62.]
  [240. 307. 199.]
  ...
  [299. 603. 195.]
  [272. 789. 173.]
  [146. 824. 155.]]]
(2, 2048, 3)
'''

Point Cloud

h5py

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

CloudComputing2022 hw2 Previous

Open3D Next