Project

General

Profile

Cosmic Ray Tagger - Experts Only » History » Version 85

« Previous - Version 85/87 (diff) - Next » - Current version
Rui An, 10/08/2019 11:01 AM


Cosmic Ray Tagger - Experts Only

If you are a new CRT expert, please register to the mailing list by clicking here: subscribe
(replace Firstname and Lastname with yours)

Grafana - Online Monitoring system

Connection instructions to crtevb through ws01:

kinit username
ssh username@ubdaq-prod-ws01.fnal.gov -L 3000:localhost:3000 -L 8086:localhost:8086
(uboonedaq as username if you do not have an account on this server)
kinit username
ssh username@ubdaq-prod-crtevb.fnal.gov -L 3000:localhost:3000 -L 8086:localhost:8086

open local web browser: http://localhost:3000/dashboard/db/uboone-crt

username: ubooneshift
password: argon!smc

- Check the total event per second is within 35k-37k.
- Check the number of connected FEBs is 9 for Bottom, 13 for FT side, 27 for Pipe side and 24 for Top.
- Check the driver status is at 0 for crt01, crt02, crt03 and crt04.
- Check the GPS status is 1 for all FEBs.
- Check the DAQ status is 1 for all FEBs.

Slow Control - How to access

If you have an account in ws01, please login with it. If you login in as ubooneshift, please notify uB shifters (+01 630 840 6967)

1. On your mac, in a terminal, do the following:

kinit username
ssh -C -N -q -K -L 5902:localhost:5902 username@ubdaq-prod-ws01.fnal.gov &

This runs the vnc tunnel in the background.

2. Click on an empty screen on your mac desktop and get the apple menu. Under the “go” menu, select “Connect to Server”

3. This opens a new mini window, enter "vnc://localhost:5902” for the Server Address and “connect”

4. This should ask you for the shared GUI password, which is xxxxxxsmc, where “xxxxxx” is our usual doc-db password.

5. Now you are connected to the shared GUI.

To operate Wiener PL512: password to expert panel is xxxxxxCRT, where “xxxxxx” is our usual doc-db password.

Bottom: [Ch0 & Ch1]. Pipe Side: [Ch02 & Ch03 & Ch04]. Feedthrough Side: [CH05 & Ch06]. Top: [Ch07 & Ch08 & Ch09].

DAQ - Instructions for standalone mode (this is the current mode):

------- Recovery after server shutdown -------

Only the user crtuser will be used for DAQ actions:

For initialise driver:
ssh to crtevb following same instructions listed previously: (ssh )
switch to crtuser using: su - crtuser
pw: xxx123 (hint: cosmic ray tagger for xxx)
from crtevb: ssh to each of crt0X-priv; (crt01, crt02, crt03, crt04, crt05)

ssh 
cd /home/kreslo/DAQdriver/New_GPS
sudo ifup eth0
screen -S febdrv
sudo ./febdrv eth0 "min poll time" (min poll time = 210 for crt02 also crt01 and crt05, 270 on crt03 and crt04)
Ctrl-A D
exit

For initialise Grafana monitoring
ssh to crtevb following same instructions listed previously: (ssh )
switch to crtuser using: su - crtuser

cd /home/kreslo/DAQdriver
sudo service grafana-server start
./start_slowmon

Tip: Grafana can be offline by itself and would not restart like below (reason still unknown). Last time it happened on 06/09/2019, the same command worked again after about 12hours.

sudo service grafana-server start
Starting Grafana Server: ... [ OK ]
FAILED

Please check that the data points are visible and updating in grafana. If no datapoints are visible, it could be that influxdb is not running or running in the wrong configuration. Please try then:

sudo influxd &

Check after this command if the data points appeare in grafana.

Configure all febs and put on bias from crtevb:

cd /home/kreslo/DAQdriver
./conf_start_crt01_calibrated
./conf_start_crt02_calibrated
./conf_start_crt03_calibrated
./conf_start_crt04_calibrated

Start cron script (automatically restart a run every 4 hours) and start first run:

cd /home/kreslo/DAQdriver
./startcron
./start_run

Start epics for CRT
ssh to crtevb as uboonepro (ssh ):

cd /home/uboonepro
sh sc_daemon.sh

modify the last line:
python $HOME/.local/bin/sc2epicsdaemon.py start
with restart or stop for forcing the restart of the daemon or stop to stop it.

------- Shutdown before server shutdown or for debugging works on FEBs -------

ssh to crtevb as crtuser following same instructions listed previously: (ssh )

cd /home/kreslo/DAQdriver
./stop_run
./stopcron

Stops DAQ, bias off on all febs - to be executed before power down!
ssh to crtevb as crtuser following same instructions listed previously: (ssh )

cd /home/kreslo/DAQdriver
./shutdown_global

Stops grafana by killing all Influxdb prcocess otherwise restarting grafana will encounter a glitch!!!


---------------------------------------- Shutdown Servers -----------------------------------

After DAQ is off, turn off server one by one in crt-evb and ubdaq-prod-crt01,02,03,04 using:

ssh
sudo poweroff
ssh
sudo poweroff
ssh
sudo poweroff
ssh
sudo poweroff
ssh
sudo poweroff

--------------------------------------------------------------------------------------------------
Normal operation: run start - stop (currently executed by cron every day at 01:10)

Stop run and kill storage clients:
cd /home/kreslo/DAQdriver
./stop_run

Restart run with automatic name assignment:
cd /home/kreslo/DAQdriver
./start_run

Log message (Appears in the slow control dashboard message window):
./logmessage My-Status-Message-with-no-spaces
This is to be done only when general data taking conditions change, like debugging period, beam down, etc.
Default statement for this message is: Standalone-DAQ-in-progress..
Will change in future for uBooNE_DAQ-in-progress..

-------------- OBSOLETE!--------------
To open(reopen) ports on all crt0x servers: *
./openall *
------------------------------------------
Low level HOWTO (no need to use it directly, just for records)

To start febinfluxdb instances (if not yet running)
./start_slowmon

To stop them:
killall febinfluxdb

To open 5 text monitors:
./startmon

DAQ control: (not affecting storage clients) - no need to use directly, included in start_run and stop_run
./daq_start_global
./daq_stop_global

DAQ - Instructions for artdaq: (We are not running in this configuration yet!!!)

To log in and run the DAQ:

Login to the gateway machine with all the port forwarding for the grafana page setup
ssh ubdaq-prod-ws01.fnal.gov -L 3000:localhost:3000 -L 8086:localhost:8086

Login to the CRT event builer
ssh ubdaq-prod-crtevb

Setup the bernfebdaq code
source /artdaq_products/setup
setup bernfebdaq -qe10:prof:s41:eth

Make sure you have a process management tool (PMT) configuration file handy. It should be one line per process, with columns for type of process, host, and port number. An example:
BoardReaderMain ubdaq-prod-crt02 5205
BoardReaderMain ubdaq-prod-crt03-priv 5205
BoardReaderMain ubdaq-prod-crt04-priv 5205
EventBuilderMain ubdaq-prod-crtevb-priv 5235

Make a symbolic link of that file to pmtConfig in your working directory, and then start/initialize the DAQ processes
SDAQ_Initialize.sh
(if you want to see these scripts, use `which` to get location)

Check status of processes. They should be ‘Booted’
SDAQ_Status.sh

Configure the processes
SDAQ_ConfigProcesses.sh
Note, it uses hard-coded FHICL files located in /artdaq_products/bernfebdaq/<version>/fcl
BoardReader_BernDataZMQ_02.fcl
BoardReader_BernDataZMQ_03.fcl
BoardReader_BernDataZMQ_04.fcl
EventBuilder_uBooNECRT.fcl

The current directory is also included in the FHICL_FILE_PATH in the SDAQ scripts, so it should be capable of taking these files from your current directory first.
Check status should now return ‘Ready’
Start a run
SDAQ_StartRun.sh -r <run_number>
Status should now be ‘Running’
Stop a run cleanly
SDAQ_StopRun.sh
Stop a run dirtily/kill all processes
SDAQ_Terminate.sh

What are the board readers doing?
In the current mode, the BernZMQ generator looks for data coming from a ZMQ publisher. The port number is part of the BoardReader configuration. Data is published on the crt0* machines on tcp port 5556.
Data is produced from the FEB driver (febdrv), connected to the chain of FEBs on eth0. You do not need to start or stop febdrv: it’s just running. All we do is connect to the data coming out.

Fake data mode
This can be configured to take data from a “fake” generator, which is a good thing to check for stability to try not to interfere with ongoing data streaming.
There are python scripts that make this fake data and publish it on port 5566: you can run these to produce data, and then run the DAQ listening to this data.
Configure fcl files to listen to that port
For now, you will need to start and stop the python scripts on your own on each machine. Also, you need to use system python (2.6), not the distributed python (2.7) used in the artdaq setup, as I haven’t gotten the zmq module installed for that one yet.
Note: there’s NO other change to the configurations, so it operates in EXACTLY the same way
This could be used to run the DAQ on a separate cluster...

While you are running:
Watch the grafana page: http://localhost:3000/dashboard/db/uboone-crt
Assuming you set the port forwarding up like I told you
Things to watch for:
Make sure everything on the home page stays relatively constant
Event rates are stable and at around 300 Hz per FEB
The max events in buffer doesn’t go much higher than 200 (1024 and we start to lose data!)
Connected FEBs stays constant
Driver status is all ok
If something starts to look like it’s not working well, STOP THE ARTDAQ RUN
Terminate if you have to

Looking at the data
The output from the production DAQ are art-ROOT files. Examples are in my home directory (/home/wketchum).
You can see the contents using the eventdump fcl (now included in release)
art -c eventdump.fcl -s /home/wketchum/uboone_bernfebdaq_002001_000001_000001_20161109T144150_20161109T144223.root -n 1
Sample ArtModules are included in $BERNFEBDAQ_DIR/bernfebdaq/ArtModules directory. You can run the “TimeCoincidence Module” which prints out some timing info by doing
art -c RunTimeCoincidence.fcl -s /home/wketchum/uboone_bernfebdaq_002001_000001_000001_20161109T144150_20161109T144223.root -n 1
Data format is located in $BERNFEBDAQ_CORE/bernfebdaq-core/Overlays/BernZMQFragment.hh

Installing the DAQ
If this hasn’t already been done, you need the artdaq products. You can do a pull products
http://scisoft.fnal.gov/scisoft/bundles/tools/pullProducts
./pullProducts /artdaq_products slf6 artdaq-v1_13_02 s41-e10 prof
I suggest putting the products in an “artdaq_products” directory...
If you’re doing this on the CRT cluster, I suggest crt05, since it’s a spare machine and not connected to any FEBs.
You can then pull down the repository for bernfebdaq
git clone ssh://p-uboonedaq@cdcvs.fnal.gov/cvs/projects/uboonedaq-bernfebdaq
OR git clone http://cdcvs.fnal.gov/projects/uboonedaq-bernfebdaq
I’m tagging and making releases, but currently the most up-to-date thing for debugging is the hotfix/v00_03_04 branch
To build, you need to setup and then build three products
Check the source scripts to make sure they point to the right areas
Make an install directory and copy into the .upsfiles and .updfiles directories from your artdaq_products installation area
source setup_bernfebdrv.sh; buildtool install
source setup_bernfebdaq_core.sh; buildtool install
source setup_bernfebdaq.sh; buildtool install

Disk space in crtevb

Instructions for cleaning up disk space in crtevb:

CRT runs already in tape are moved to crtevb:/raid/CRT_data/TransferredData

in any gpvm server, set up uboone and samweb tools:

source /grid/fermiapp/products/uboone/setup_uboone.sh
setup uboonecode v06_36_00 -q e14:prof

check if runs corresponding to one day are on samweb (example for: October 16th)

samweb list-files "file_name=%20171016%crtdaq" | sort

will list the files in sam for that day

ProdRun20171016_001007-crt04.1.crtdaq
ProdRun20171016_092319-crt01.1.crtdaq
ProdRun20171016_092319-crt02.1.crtdaq
ProdRun20171016_092319-crt03.1.crtdaq
ProdRun20171016_092319-crt04.1.crtdaq
ProdRun20171016_121007-crt01.1.crtdaq
ProdRun20171016_121007-crt02.1.crtdaq
ProdRun20171016_121007-crt03.1.crtdaq
ProdRun20171016_121007-crt04.1.crtdaq
ProdRun20171016_161007-crt01.1.crtdaq
ProdRun20171016_161007-crt02.1.crtdaq
ProdRun20171016_161007-crt03.1.crtdaq
ProdRun20171016_161007-crt04.1.crtdaq
ProdRun20171016_201007-crt01.1.crtdaq
ProdRun20171016_201007-crt02.1.crtdaq
ProdRun20171016_201007-crt03.1.crtdaq
ProdRun20171016_201007-crt04.1.crtdaq

in crtevb:/raid/CRT_data/TransferredData, delete the files already declared in sam. Please double check all files are in sam before deleting any of them.

cd /raid/CRT_data/TransferredData
ls -lhrt ProdRun20171016_*-crt0*.1
sudo rm ProdRun20171016_*-crt0*.1

Raw data transfer failing in crtevb

Instructions for restoring the raw data transfer and declaration in SAM.

Log files for this process are generated every day at (example for April 21st, 2018 ):
crtevb:/home/uboonepro/data_transfer/transfer_logs/transfer_log_delay_48h_2018-04-21.log

check the error message.

If error:

Reading files from list /home/uboonepro/data_transfer/file_lists/file_list_delay_48h_2018-04-21.txt
File: /raid/CRT_data/ProdRun20180416_001006-crt04.1
Generatring JSON...
File name is ProdRun20180416_001006-crt04.1.crtdaq
Start / end time = 2018-04-16T00:09:56 / 2018-04-16T04:10:12
Copying data...
error: globus_ftp_client: the server responded with an error
550 File exists
program: globus-url-copy -rst-retries 1 -gridftp2 -nodcau -restart -stall-timeout 14400 file:////raid/CRT_data/ProdRun20180416_001006-crt04.1.crtdaq.json gsiftp://stkendca46a.fnal.gov/pnfs/fnal.gov/usr/uboone/scratch/uboonepro/dropbox/data/uboone/crt/exited status 1
delaying 23 ...
retrying...

something bad happened when copying the previous file.

In any gpvm server, remove the problematic file (ProdRun20180416_001006-crt04.1.crtdaq) and its .json from

/pnfs/uboone/scratch/uboonepro/dropbox/data/uboone/crt

This will restart the process next time the transfer is executed.

If any different error, please contact me (David L).

Slowmon data transfer to EPICS failing

Usually this can be fixed by restarting the data transfer process on ubdaq-prod-crtevb, as follows:

ssh to crtevb as uboonepro (ssh ):

cd /home/uboonepro
sh sc_daemon_restart.sh

If this does not work, then use ps xw to see if a process exists running python /home/uboonepro/.local/bin/sc2epicsdaemon.py start. If so, kill that process (using kill -KILL if necessary), and then execute /home/uboonepro/sc_daemon_start.sh.

Network traffic monitoring

Monitoring the network of the crt daq machines may help to explain the peaks in poll duration / events per poll. The dashboards are included in the current grafana dashboard febdrv expert. Please check if the peaks in the number of events per poll / poll duration are correlated with peaks in the network traffic of one of the crt daq machines (especially crt01).
Instruction of how and which scripts are running for the monitoring will follow here soon.

Troubleshooting

If some FEB shows abnormal behavior (like low/strange rate) try to reconfigure the FEB/ the FEBs in that line (stop run, stop cron, load configure files, try to start again. Sometimes also a powercycle of a certain FEB line can help (stop run, stop cron, turn power of specific line off, turn power back on, make sure that all FEBs of that plane are connected (else turn off an on again), start cron, start run).

All FEBs connected, but one feb has low (~10 Hz) eventrate

- The raw data shows only that the reference events of that FEB are generated/stored.
- Cause: a short of one channel prevents to have the high voltage on the SiPMs, so now physical events are triggered/stored.
- To turn off the bias voltage does not change anything in the behavior of this FEB, better turn it off.
The FEB need to be modified. The channel with the short (measure with the voltmeter at the CRT panel which it is) has to be manually disconnected from the FEB (on the circuit board of the FEB) and then placed back. High voltage should now again be can achieved and the event rate should be back to be nominal with a decrease of one channel (so be 1/16 lower than before ~100 - 1000 Hz). This procedure needs physical access to the FEB, so check which/where the affected FEB is located and organize the access (Top: crane with basket is needed, sides: lift, fall protection? ODH?) and discuss the further steps with run coordinators.