Project

General

Profile

The Data Pool Manager

The DPM used in the Fermilab Control System has the following features:

  • Much more aggressive at merging data requests than legacy C-based or Java-based consolidators were.
  • Uses protocol compiler messaging so all client languages have equal access.
  • Uses ACNET service discovery, so DPMs can be added and removed from the control system with minimal impact to clients.
  • Has a TCP interface (port 6801) so languages that don't have/need full ACNET support can still get accelerator data efficiently.

The intent is to find all things DPM-related in this project. It currently contains documentation, issue tracking, code reviews, and repositories for:

  • the Erlang-based DPM.
  • the Erlang DPM protocol library

Application Programming Interface

Available data sources can be on the Data Sources page.

Accessing DPM Virtual Machines

DPM is written in Erlang and runs in a Erlang virtual machine. We've configured the start-up parameters to allow remote connections. To connect to the DPM running on CLX25, log in to any of the CLX machines and start an Erlang VM, but specify where to start a remote shell (replace NAME, in the command line, with some unique name):

$ erl -sname NAME -setcookie newdpm -remsh dpm@clx25
Erlang/OTP 18 [erts-7.0] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V7.0  (abort with ^G)
(dpm@clx25)1> 

You'll see the shell's start-up messages and then a prompt. The prompt shows the machine on which it's running. To disconnect from DPM, hit ^G and then q<enter>.

(dpm@clx25)1> (press ^G)
User switch command
 --> q
$ 

Checking Log Entries

An Erlang VM can run multiple shells. When the node starts up, all output is sent to the initial shell. If we want to view the logs, we need to redirect the output of the log process to our shell. The following two commands can be entered from a remote shell and will set things up so the log entries will get sent to the remote shell's output:

(dpm@clx25)1> rb:start().
(dpm@clx25)2> group_leader(group_leader(), erlang:whereis(rb_server)).

Now you can use the rb:list(), rb:rescan(), and rb:show() commands. Before disconnecting, you should probably shutdown the log viewer before the shell exits:

(dpm@clx25)3> rb:stop().

Development

This project is written in Erlang and uses several Fermilab-authored, Erlang applications. Any of the CLX nodes will have an appropriate Erlang environment available to allow contributing to this project. The source code is available from the project's Redmine repository :

$ git clone ssh://p-acsys-dpm@cdcvs.fnal.gov/cvs/projects/acsys-dpm

The master branch should only have "finalized "code on it; it should compile cleanly and be reasonably bug-free (i.e. tested.) It is recommended that, after cloning the repository, you make a local branch called "devel" to which you can apply your development commits. When the code has been tested, it can then be merged back into master.

Using the devel branch

After cloning the project, you can create and switch to the devel branch by doing this:

$ git checkout -b devel
$ git status

The second command will show you're on the devel branch.

It is easy to switch between the two branches:

$ git checkout master
$ git checkout devel

Make sure you're on the devel branch while you develop. Use "git add" and "git commit" to apply your development history to the devel branch. When you've tested your changes and you're ready to share them, do the following:

1. $ git checkout master This will switch your working directory to the contents of the master branch.
2. $ git pull Pulls the latest changes from the remote repository and applies them to the master branch in both your local repository and the contents of your working directory. If master was up-to-date (no new patches were applied from the remote repository) then go to step 6
3. $ git checkout devel You need to go back to your development branch because it wasn't tested against the latest master branch.
4. $ git rebase master This re-applies your development branch to the new head of master. If your development branch has extensive changes, you may have conflicts which have to be resolved. Once the rebase is done, continue to step 5.
5. Re-test your changes. When you're ready, go to step 1.
6. $ git merge devel If you did everything correctly, you'll see that master was "fast-forwarded" to devel.
7. $ git push Now you can push master to the project's repository. There is a tiny chance that someone pushed new changes to master between steps 6 and 7. If this happens, you'll have to do a git pull and resolve any conflicts before re-trying step 7.

Do not push your devel branch to the repository!

Releasing

Once your contribution is in the system, it needs to be applied to the operational DPMs. The first step is pushing the new code out into the download area.

$ make release

At this point, you may want to do an xcons-update-all to push the changes to all the CLX's download area. But this step isn't required.

Finally, the DPMs need to be restarted. There are eight instances of DPMs running; one on each CLX5, CLX18, CLX19, CLX20, CLX21, CLX22, CLX23, and CLX25. For each node you do the following:

$ ssh clx25
$ acnet restart erl_dpm
$ ^D

NOTE: Unless there's a severe bug that needs to be fixed, it's a good idea to release new code a few nodes at a time. Let one or two nodes run with new code for a day to make sure serious bugs weren't introduced. Since the DPMs are found using service discovery, a client that crashes a DPM (due to a bug) will work itself through each DPM, crashing each until none are running!

Performance Measurements

During development, we focused on correctness over performance and, for the most part, the resulting performance has been acceptable. As DPM's use became more wide-spread, we saw instances of high CPU or memory usage. The culprits tended to be applications using non-optimal ways to acquire data (i.e. doing hundreds of one-shots each second instead of setting up a list.) We can't force programmers to write correct code, so we need to improve DPM to handle these pathological cases. It should be mentioned that performance issues aren't show-stoppers because we can always add more DPMs to the pool. Right now (November, 2016) we have eight instances running.

We documented our efforts to improve DPM's performance here: Profiling DPM.

Webinar

In December of 2015, Rich Neswold and Dennis Nicklaus provided a webinar to Erlang Solutions discussing the new DPM based on Erlang. It can be found here: Erlang in High Energy Physics Research