Hdf5 table

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search

Storing table in HDF5

The HDF5 Table interface condenses the steps needed to create tables in HDF5. The datatype of the dataset that gets created is of type H5T_COMPOUND. The members of the table can have different datatypes.


Writting a table using Python (PyTables)

PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. The following example shows how to store a table of 10 records with 7 members :

name ADCcount grid_i grid_j pressure energy idnumber
16-character String Unsigned short integer 32-bit integer 32-bit integer float (single-precision) double (double-precision) Signed 64-bit integer

The script has been run on gpc with the following modules :

module load gcc/4.8.1  intel/14.0.0  python/2.7.2  hdf5/1811-v18-serial-gcc

PyTable 3.0.0 has been compiled in my scratch directory.

from tables import *

class Particle(IsDescription):
    name      = StringCol(16)   # 16-character String                                                                                                         
    ADCcount  = UInt16Col()     # Unsigned short integer                                                                                                      
    grid_i    = Int32Col()      # 32-bit integer                                                                                                              
    grid_j    = Int32Col()      # 32-bit integer                                                                                                              
    pressure  = Float32Col()    # float  (single-precision)                                                                                                   
    energy    = Float64Col()    # double (double-precision)                                                                                                   
    idnumber  = Int64Col()      # Signed 64-bit integer                                                                                                       


h5file = open_file("tutorial1.h5", mode = "w", title = "Test file")
group = h5file.create_group("/", 'detector', 'Detector information')
table = h5file.create_table(group, 'readout', Particle, "Readout example")
particle = table.row
for i in xrange(10):
    particle['name']  = 'Particle: %6d' % (i)
    particle['ADCcount'] = (i * 256) % (1 << 16)
    particle['grid_i'] = i
    particle['grid_j'] = 10 - i
    particle['pressure'] = float(i*i)
    particle['energy'] = float(particle['pressure'] ** 4)
    particle['idnumber'] = i * (2 ** 34)
    # Insert a new particle record                                                                                                                            
    particle.append()

h5file.close()

Reading the table with a C++ code with MPI for parallel programming

The following example shows how to read the table in a MPI process (each MPI process will read one individual record). The code has been compiled and tested on BlueGene with the following modules :

module load vacpp/12.1  xlf/14.1  mpich2/xl hdf5/189-v18-mpich2-xlc