Project

General

Profile

About the data bank and working with HDF files

The data will be passed between the modules in the form of HDF5 files. We have a nifty Python class for reading and writing to this file type (archive.py, see SVN), see examples below. An HDF can store any kind of data (ints, floats, nested arrays, …) in an internal address tree, like a file system in miniature. This has the advantage that the data can be accessed with human-readable keywords instead of e. g. column numbers as in a CSV file. Also HDF is especially useful for the large parameter files for Instrument Design, which are already HDF.

In each module, the data should be read from file only once and at the very beginning. Correspondingly, the output writing commands have their place at the very end of the module. In between, all data must be in the sole form of local workspace variables.

All data (except for unconverted input files) is to be stored in the file data/data_bank.h5.

Here is an example of a module which reads the magnitude in g from data_bank.h5, together with the galaxy index.

with archive.archive('../data/data_bank.h5','r') as input:
    galaxyIndex = input['/gal/galaxy_index']
    magnitude_g = input['/gal/magnitude_g']

The module then calculates some other, new property, e. g. a selection flag (boolean), assigned fiber (int) or a throughput curve (array). This new property, along with the galaxy index, is written to File.h5:

with archive.archive('../data/data_bank.h5','a') as output:
    output['/gal/selection_flag'] = selection_flag

http://www.hdfgroup.org/HDF5/whatishdf5.html

HDF5 on the command line

List the content of a hdf5 file

h5ls -r your-hdf5-file

Dump the content of a hdf5 file

h5dump your-hdf5-file

The archive.py generator function

The Idea

It is simpler to access generated data in a similar way as saved data is accessed.

How to do it:

Write a class that has the following structure:

class testModule:
    def __init__(self):
        #do some initialisation
        self.var = "test" 

    def __call__(self, value):
        #do some processing and return values
        print self.var, value
        return value+1

Write something like this in the data bank:

#The path of the module, relative to the calling module:
ar['/Generators/test/modpath'] = "../../test/" 

#The name of the file without the .py extension
ar['/Generators/test/modfile'] = "test" 

#The name of the class inside the module
ar['/Generators/test/modclass'] = "testModule" 

#The path in the hdf5 that should call the module/function
ar['/Generators/test/h5path'] = "/test/increment"   

There can be more than one generator, they need to reside in /Generators/name_of_generator/ or they won't be found automatically

Inside the module that uses the generator, include the lines:

from archive import registerGenerators
registerGenerators(path_to_your_databank)

This registers all the generators and calls the init function of the classes.

you can now access the value using:

with archive.archive() as ar:
    print ar['/test/increment', {'field': 0, 'value': 20 } ]