Project

General

Profile

API

The API is primarily a set of i/o functions which are used to handle the input and output of the modules.

The Problem

This pipeline contains different modules that pass data from one to each other. The module producing the data has to use the same format as the module using the data, because otherwise there would be a mismatch and therefore an incorrect result. The modules also have to expect the data in the same file, at the same place, so they can find it.
So it is necessary that there is a standard for the data storage format and a standard for where the data is stored.

Background

The idea was to create a folder which contains all the data. This would make the data passing a bit easier. The API takes this idea a little further by providing input and output functions for the modules. That means that the programmer does not have to program the read and write routines for input and output, he calls a function of the API to get the data and calls a function to write it to disk.
This takes off work from the programmers/maintainers and makes it easier to use and test different data and parameters, because there is a common data format. It also reduces the possibility saving the same parameter twice, possibly with different values.

How it works

All data is (ideally) stored in a single file, we call it here archive. The archive is a hdf5 file, which is a file format for storing scientific data in a convenient and efficent way. Inside the archive, there are paths, they work like the paths in a file system. There is a root directory, it is called / . It contains other directories (no variables for root), while a directory can contain directories and variables.

Every variable is stored in a path, e.g. "/param/throughput/wavelength". This means, that in the / directory is a directory called param, and this directory contains a directory throughput. In the throughput directory is a variable called wavelength. This variable can be of any type: int, double, array, array of array etc. This means that every variable has a distinct path.

The API has a function to get a variable in a certain path and a function to set a variable in a certain path.

How it works with python

Add data to the archive

with archive.archive("param.h5", 'a') as ar:
        ar["/param/throughput/wavelength"] = wavelength

This opens the archive "param.h5" and adds the content of the variable wavelenght (e.g. a array of double) to the archive "param.h5". The content is stored in the path /param/throughput/wavelength. The Flag 'a' is used to append data to the archive, there is also 'r' for reading the archive and 'w' which creates a new archive and deletes existing archives with the given name.

Get data from the archive

with archive.archive("param.h5", 'r') as ar:
        wavelength = ar["/param/throughput/wavelength"]

This opens the archive for reading and reads the content of the path "/param/throughput/wavelength" into the variable wavelength.