The Data Pool Manager¶
The DPM used in the Fermilab Control System has the following features:
- Much more aggressive at merging data requests than legacy C-based or Java-based consolidators were.
- Uses protocol compiler messaging so all client languages have equal access.
- Uses ACNET service discovery, so DPMs can be added and removed from the control system with minimal impact to clients.
- Has a TCP interface (port 6801) so languages that don't have/need full ACNET support can still get accelerator data efficiently.
The intent is to find all things DPM-related in this project. It currently contains documentation, issue tracking, code reviews, and repositories for:
- the Erlang-based DPM.
- the Erlang DPM protocol library
Application Programming Interface¶
Available data sources can be on the Data Sources page.
Accessing DPM Virtual Machines¶
DPM is written in Erlang and runs in a Erlang virtual machine. We've configured the start-up parameters to allow remote connections. To connect to the DPM running on
CLX25, log in to any of the
CLX machines and start an Erlang VM, but specify where to start a remote shell (replace NAME, in the command line, with some unique name):
$ erl -sname NAME -setcookie newdpm -remsh dpm@clx25 Erlang/OTP 18 [erts-7.0] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] Eshell V7.0 (abort with ^G) (dpm@clx25)1>
You'll see the shell's start-up messages and then a prompt. The prompt shows the machine on which it's running. To disconnect from DPM, hit ^G and then
(dpm@clx25)1> (press ^G) User switch command --> q $
Checking Log Entries¶
An Erlang VM can run multiple shells. When the node starts up, all output is sent to the initial shell. If we want to view the logs, we need to redirect the output of the log process to our shell. The following two commands can be entered from a remote shell and will set things up so the log entries will get sent to the remote shell's output:
(dpm@clx25)1> rb:start(). (dpm@clx25)2> group_leader(group_leader(), erlang:whereis(rb_server)).
Now you can use the
rb:show() commands. Before disconnecting, you should probably shutdown the log viewer before the shell exits:
This project is written in Erlang and uses several Fermilab-authored, Erlang applications. Any of the
CLX nodes will have an appropriate Erlang environment available to allow contributing to this project. The source code is available from the project's Redmine repository :
$ git clone ssh://firstname.lastname@example.org/cvs/projects/acsys-dpm
master branch should only have "finalized "code on it; it should compile cleanly and be reasonably bug-free (i.e. tested.) It is recommended that, after cloning the repository, you make a local branch called "
devel" to which you can apply your development commits. When the code has been tested, it can then be merged back into
After cloning the project, you can create and switch to the
devel branch by doing this:
$ git checkout -b devel $ git status
The second command will show you're on the
It is easy to switch between the two branches:
$ git checkout master $ git checkout devel
Make sure you're on the
devel branch while you develop. Use "
git add" and "
git commit" to apply your development history to the
devel branch. When you've tested your changes and you're ready to share them, do the following:
||This will switch your working directory to the contents of the
||Pulls the latest changes from the remote repository and applies them to the
||You need to go back to your development branch because it wasn't tested against the latest
||This re-applies your development branch to the new head of
|5.||Re-test your changes. When you're ready, go to step 1.|
||If you did everything correctly, you'll see that
||Now you can push
Do not push your
devel branch to the repository!
Once your contribution is in the system, it needs to be applied to the operational DPMs. The first step is pushing the new code out into the download area.
$ make release
At this point, you may want to do an
xcons-update-all to push the changes to all the
CLX's download area. But this step isn't required.
Finally, the DPMs need to be restarted. There are eight instances of DPMs running; one on each
CLX25. For each node you do the following:
$ ssh clx25 $ acnet restart erl_dpm $ ^D
NOTE: Unless there's a severe bug that needs to be fixed, it's a good idea to release new code a few nodes at a time. Let one or two nodes run with new code for a day to make sure serious bugs weren't introduced. Since the DPMs are found using service discovery, a client that crashes a DPM (due to a bug) will work itself through each DPM, crashing each until none are running!
During development, we focused on correctness over performance and, for the most part, the resulting performance has been acceptable. As DPM's use became more wide-spread, we saw instances of high CPU or memory usage. The culprits tended to be applications using non-optimal ways to acquire data (i.e. doing hundreds of one-shots each second instead of setting up a list.) We can't force programmers to write correct code, so we need to improve DPM to handle these pathological cases. It should be mentioned that performance issues aren't show-stoppers because we can always add more DPMs to the pool. Right now (November, 2016) we have eight instances running.
We documented our efforts to improve DPM's performance here: Profiling DPM.
In December of 2015, Rich Neswold and Dennis Nicklaus provided a webinar to Erlang Solutions discussing the new DPM based on Erlang. It can be found here: Erlang in High Energy Physics Research