Project

General

Profile

The NOvA C++ API

The NOvA Database Interface (DBI) is currently a C++ package designed for both the online and offline software to use to read and write to the database. The interface is supposed to be simple and flexible while also providing optional controls over how data are read and written.

The first step of making use of the DBI is to either create a new table definition file or to make use of an existing one. Tables and columns may be defined via XML, but a C++ DB table object may also be created by determining the table columns directly from teh database. Even if you do not have to create a new table, it is useful to understand how a table is defined when trying to extract data from the database. Specifically, to make use of a table, one must know the names of the columns in the table.

We note that with the exception of hardware database tables, all DAQ, DCS and Offline database tables exist in their detector-specific "schema" (think "folder"). Therefore, when creating and accessing a database table via the DBI, one must declare the schema (eg, detector). Failure to do so will result in a runtime exception to be thrown.

Defining a Table in XML (<dBTable>)

The NOvA DAQ database interface offers a XML-based approach to defining a table. The dB XML schema is defined in NovaDatabase/config/NovaDatabase.xsd. An example of a table definition in XML is in NovaDatabase/tables/RunHistory/Runs.xml, eg:

<?xml version="1.0"?>

<dBTable
  tableName="runs" 
  dbHost="novadbdev.fnal.gov" 
  dbName="nova_dev" 
  creationDate="2010-09-15" 
  maintainer="jpaley@anl.gov" 
  addInsertTime="true" 
  addInsertUser="true">

  <dBColumn name="run"           colType="int"       null="0"/>
  <dBColumn name="nsubruns"      colType="int"       null="1"/>
  <dBColumn name="nevents"       colType="int"       null="1"/>
  <dBColumn name="partition"     colType="int"       null="1"/>
  <dBColumn name="runtype"       colType="string"    null="1"/>
  <dBColumn name="config"        colType="string"    null="1"/>
  <dBColumn name="configid"      colType="int"       null="0"/>
  <dBColumn name="detector"      colType="string"    null="0"/>
  <dBColumn name="tstart"        colType="timestamp" null="0"/>
  <dBColumn name="tstop"         colType="timestamp" null="1"/>
  <dBColumn name="npot"          colType="float"     null="1"/>
  <dBColumn name="nactivechan"   colType="int"       null="1"/>
  <dBColumn name="shifter"       colType="string"    null="1"/>
  <dBColumn name="begin_comment" colType="string"    null="1"/>
  <dBColumn name="end_comment"   colType="string"    null="1"/>

  <dBPKey>
    <dBCol name="run"/>
    <dBCol name="detector"/>
  </dBPKey>

</dBTable>


A table definition in XML requires:
  • a table name ("tableName" attribute),
  • a database host ("dbHost" attribute),
  • a database name ("dbName" attribute),
  • a list of columns via a sequence of <dBColumn> tags, and
  • a primary key via the <dBPKey> tag
Optionally, one may also include:
  • a database port using the "dbPort" attribute
  • a table "type"; currently the DBI supports the concepts of tables with "runsubrun" and "timestamp" validity context. See section RunSubrun and Timestamp Validity Tables for more details.
  • a list of unique combinations of columns, which would require that each row of the table have unique values for each set of columns
  • a list of checks that a column value must conform to (eg, "runnum>0")
  • a list of rows via a sequence of <dBRow> tags that contain data, and the API allows one to use only that data. This may be useful for testing.

NOTE: the values of the database host, name, port and user in the XML table definition file may be overloaded by the $NOVADB* environment variables.

Within the <dBTable> tag, several other optional attributes may be used to configure the table:

  • Table type "dbType": there are three kinds of tables on may define:
    • "generic": rows are uniquely determined by the columns listed in the <dBPKey> tag.
    • "hardware": effectively tells the DBI to ignore the schema
    • "validity": validity-context database table; see the Validity Tables section for more details
  • Override dB data with XML-defined rows ("xmlOverDB")
  • Disable connections to the database ("ignoreDB"): if you'd like to disable read/writes to the database
  • Create/Update history: if you'd like to keep track of who and/or when rows were either inserted or updated, add the option attributes addInsertTime="true", addInsertUser="true", addUpdateTime="true" and/or addUpdateUser="true" attributes to <dBTable>. This will result in a corresponding column to be added to the table; the DBI will handle populating these columns whenever a row is added or modified. Note that inserttime may later be used as a useful validity context.

Creating a Table via the Database Definition

A table object may also be created if one knows the name of the table, and provides the database connection information via the method:

bool Table::CreateFromDB(const char* schemaName, const char* tableName, 
                         const char* dbhost, const char* dbname,    
                         const char* dbport, const char* dbuser)

An example of how to use this is:


nova::database::Table myTable();

myTable.CreateFromDB(novadaq::cnv::DetInfo::GetName(novadaq::cnv::kFARDET).c_str(),"runs","ifdbprod.fnal.gov","nova_prod","5433","nova_reader");

etc.

Note that the user is expected to know the column names and data types.

Columns (<dBColumn>)

A column in a table may be of basically any data type. A column must have [unique] name (within the table), defined in the name attribute. The data type is defined in the colType attribute.

  • Supported data types: short, int, long, bigint, float, double, bool, timestamp, string, autoincr. A couple of notes:
    • "bigint" = "long". bigint is used in SQL, long is probably more familiar to those working in C++
    • "autoincr" is a column whose value is automatically incremented upon insert. Do_not set this value, it is done for you (the dbi should not allow this column to be set directly anyway)
    • if there is a new data type that you feel needs to be supported, please contact Jon Paley]
  • Allow NULL: if the value of the column must be set before writing to the database or external file, add the null="false" attribute to the <dBColumn> tag.
  • Foreign key: if the column is a foreign key into another table, give the name of the table to the fKeyTable attribute. [NOTE: this feature is currently disabled.]

Primary Key (<dBPKey>)

A primary key is a set of columns that uniquely define a row in the table. Although Postgresql does not require a table to have a primary key, it is unwise to create a table without one defined. Therefore, the C++ dbi requires that one is defined in the XML definition of the table. A primary key is defined by adding a list of columns (previously defined) between the <dBPKey> tag.

Unique combinations of columns (<dBUnique>)

One may further constrain a table such that one or more combinations of column values be unique in each row of a table. As an example, image that one is using a column of type autoincr as a primary key. But the table has columns for DCM, FEB, APD Pixel and a timestamp that must be unique in each row. One may ensure that this combination of columns is unique for each row by adding the column names to a <dBUnique> tag. Note that the uniqueness is only enforced upon insert/update into the database.

Checks on column values (<dBCheck>)

Say we want to constrain the values of a column to adhere to some rule. For example, we want all run numbers to be greater than 10000. Then we can add a line in the XML table definition file that reads:
<dBCheck constraint="runnum>10000"/>

Rows (<dBRow>)

Table rows may be defined in the XML file as either base values or perhaps for testing purposes. The format is something like:

<dBRow>
  <dBCol name="myInt"       value="929"/>
  <dBCol name="myTimeStamp" value="2010-09-29 09:29:29"/>
  <dBCol name="myFloat"     value="929.929"/>
  <dBCol name="myString"    value="929.929"/>
</dBRow>
A couple of things to note:
  • regardless of the data type of the column, the "value" must be given between quotes
  • if the table is of type "RunSubrun", then <dBRow> should look something like <dBRow runmin="10001" runmax="10423">
  • if the table is of type "Timestamp", then <dBRow> should look something like <dBRow tmin="2010-11-29 09:38:23" tmax="2010-11-30 01:56:12">
  • NOT all columns in a row need to be listed with the <dBRow> tag

Validity Tables

A very common "type" of table has rows that contain data that are valid between some range. Currently the DBI supports ranges based on timestamps. To make use of this, the XML table defintion file should have the attribute "tableType" within the "dBTable" element defined as "validity". An example XML definition of a "validity" type table:


<?xml version="1.0"?>

<dBTable
  tableName="jonstest" 
  tableType="validity" 
  dbHost="ifdbprod.fnal.gov" 
  dbName="nova_prod" 
  dbPort="5433" 
  creationDate="2013-06-05" 
  maintainer="jpaley@anl.gov">

  <dBColumn name="occupancy"          colType="double"     null="1"/>
  <dBColumn name="coincidences"         colType="double"     null="1"/>

</dBTable>

Validity tables automatically have the following two columns created, and therefore these column names are special and should never be used in the XML definition:
  • channel (a long integer). A channel can be anything, a pixel, a FEB, a DCM, a block or di-block, etc., as long as the channel numbers are always guaranteed to be unique.
  • tv (a float). Represents the validity start time, where the integer portion represents the Unix time (eg, number of seconds since 01/01/1970 00:00:00 UTC), and the fraction represents the fraction of seconds.

See section Validty Table Details for more details and examples of how to use these tables.

Some More Concepts

At this point we assume that you have a valid XML table definition file. Before we go into details of the C++ DBI, there are a few more concepts to explain:

  • "DISTINCT" columns: If you wish to have the DBI return only one row per unique set of column values, add to the list of "DISTINCT" columns via the Table::AddDistinctColumn() methods. "DISTINCT" columns may later be removed from the list via the Table::RemoveDistinctColumn() methods.
  • "SELECT" columns: If you wish to have only a subset of columns of a table returned from a dB query, add to the list of "SELECT" columns. By default this list is empty and dB queries returns all columns in the table.
  • Connections to the dB: By default, one does not need to explicitly create a connection to the database; one is created and destroyed for the user everytime a read or write is done. However, creating and destroying connections is very inefficient if you will be reading or writing frequently. You may therefore override the default behavior by getting a "persistent" connection to the database. It is then the user's responsibility to close the connection (although this is also done when the table is destroyed).

How to...

  • read from the database using the read-only account: make sure the appropriate environment variables are set:
    setenv NOVADBHOST ifdbprod.fnal.gov
    setenv NOVADBNAME nova_prod
    setenv NOVADBPORT 5433
    setenv NOVADBUSER nova_reader
    setenv NOVADBPWDFILE /nova/app/db/nova_reader_pwd
    
  • create a table object: Table* t = new Table("/NovaDAQDatabase/tables/RunHistory/Runs.xml");
  • set the detector schema: t->SetDetector(novadaq::cnv::kFARDET)
  • get a persistent connection to the dB: t->GetConnection() (returns false if no connection is made)
  • close a persistent connection to the dB: t->CloseConnection() (returns false if no connection exists)
  • add a row to a table:
Row* r = t->NewRow();
r->Set("ColA","ColAData");
r->Set("ColB","ColBData");
t->AddRow(r);
delete r;
  • load data from database into table: t->LoadFromDB();
  • write data from table into database: t->WriteToDB(); NOTE: only new or modified rows are written to the dB.
  • load data from comma-separated-value (csv) file into table: t->LoadFromCSV("/path/to/data.csv");
  • write data from table into csv file: t->WriteToCSV("/path/to/data.csv");
  • update an existing entry:
t->LoadFromDB();
t->GetRow(3).Update("ColA","NewColAData");
t->WriteToDB(); 
  • update rows from a csv file and write to dB:
t->UpdateFromCSV("/path/to/data.csv");
t->WriteToDB()
  • only load rows from the dB that fall within some range. The code below would only read in entries from the database that fall within the run range [1001,1010]
t->Clear();
t->SetValidityRange("run","1001","1010"); // "run" is a column in the table
t->LoadFromDB();
  • only load rows from the dB for a distinct set of columns. The code below assumes the table has columns called "dcmid", "febid" and "pixelid". The primary key is some combination of these three columns as_well as some other distinguishing columns (eg, a timestamp).
t->Clear();
t->AddDistinctColumn("dcmid");
t->AddDistinctColumn("febid");
t->AddDistinctColumn("pixelid");
t->LoadFromDB();
  • load rows from the dB in some sort of order:
t->Clear();
t->AddOrderColumn("run");
t->AddOrderColumn("subrun");
t->LoadFromDB();
  • load rows from the dB according to when they were inserted:
t->Clear();
t->AddOrderColumn("inserttime");
t->LoadFromDB();
  • load rows from the dB according to when they were updated:
t->Clear();
t->AddOrderColumn("updatetime");
t->LoadFromDB();

(WARNING: "updatetime" could be NULL if the row was never updated, so this could give unexpected results...)

How to Use Runsubrun/Timestamp Type Tables

If you're looking for nitty-gritty details of how these tables are supported in the DBI, documentation is here. Otherwise, if all you care about are the basics of getting started (and hopefully that's all you need), then read on.

How to Load Rows from the Database

Again, nothing different, eg: trsr->LoadFromDB()

My offline code won't compile/run when I add dB code!

Be sure to add the following lines near the bottom of the GNUmakefile in your package:

include SoftRelTools/arch_spec_novadb.mk

override LIBLIBS += -lNovaDatabase -lNovaDAQUtilities -L${XERCESCROOT}/lib -lxerces-c -L${PGSQL_DIR}/lib -lpq

Using NovaCSVtoDB to Load Data into a Database Table

  1. Put the data you wish to load into the dB in a comma-separated-value (csv) file (here we name the file "table.csv").
  2. set the NOVADBUSER to $USER (this assumes that you have write access to the database) and be sure you have a krb5 ticket.
  3. Execute the command: > NovaCSVtoDB table.xml table.csv

Creating a table in the Database

In general, table creation in the production database is restricted to just a few users. Anyone will be granted permissions to create tables in the development database for testing purposes.

Assuming you have a XML table definition file, execute the following for a non-validity-based database table:

> createTableInDB -d FarDet -f "full/path/to/table.xml"

To create a validity-based database table:

> createValidityTableInDB -d FarDet -f "full/path/to/table.xml"