About the data bank and working with HDF files¶
The data will be passed between the modules in the form of HDF5 files. We have a nifty Python class for reading and writing to this file type (archive.py, see SVN), see examples below. An HDF can store any kind of data (ints, floats, nested arrays, …) in an internal address tree, like a file system in miniature. This has the advantage that the data can be accessed with human-readable keywords instead of e. g. column numbers as in a CSV file. Also HDF is especially useful for the large parameter files for Instrument Design, which are already HDF.
In each module, the data should be read from file only once and at the very beginning. Correspondingly, the output writing commands have their place at the very end of the module. In between, all data must be in the sole form of local workspace variables.
All data (except for unconverted input files) is to be stored in the file data/data_bank.h5.
Here is an example of a module which reads the magnitude in g from data_bank.h5, together with the galaxy index.
with archive.archive('../data/data_bank.h5','r') as input: galaxyIndex = input['/gal/galaxy_index'] magnitude_g = input['/gal/magnitude_g']
The module then calculates some other, new property, e. g. a selection flag (boolean), assigned fiber (int) or a throughput curve (array). This new property, along with the galaxy index, is written to File.h5:
with archive.archive('../data/data_bank.h5','a') as output: output['/gal/selection_flag'] = selection_flag
http://www.hdfgroup.org/HDF5/whatishdf5.html
HDF5 on the command line¶
List the content of a hdf5 file
h5ls -r your-hdf5-file
Dump the content of a hdf5 file
h5dump your-hdf5-file
The archive.py generator function¶
The Idea¶
It is simpler to access generated data in a similar way as saved data is accessed.
How to do it:¶
Write a class that has the following structure:
class testModule: def __init__(self): #do some initialisation self.var = "test" def __call__(self, value): #do some processing and return values print self.var, value return value+1
Write something like this in the data bank:
#The path of the module, relative to the calling module: ar['/Generators/test/modpath'] = "../../test/" #The name of the file without the .py extension ar['/Generators/test/modfile'] = "test" #The name of the class inside the module ar['/Generators/test/modclass'] = "testModule" #The path in the hdf5 that should call the module/function ar['/Generators/test/h5path'] = "/test/increment"
There can be more than one generator, they need to reside in /Generators/name_of_generator/ or they won't be found automatically
Inside the module that uses the generator, include the lines:
from archive import registerGenerators registerGenerators(path_to_your_databank)
This registers all the generators and calls the init function of the classes.
you can now access the value using:
with archive.archive() as ar: print ar['/test/increment', {'field': 0, 'value': 20 } ]