Project

General

Profile

ExpertBulletinBoard » History » Version 48

« Previous - Version 48/160 (diff) - Next » - Current version
Louise Suter, 08/09/2017 03:48 PM


ExpertBulletinBoard

Please make sure that you are familiar with the SHIFT Bulletin Board.

Current release

Coming out of summer shutdown, we are running DAQ R10_Final on both FD and ND now.

Last update on 01/17/2017 High Temperature Event

Please check if the loadshed script is running by checking the timestamp of the last log entry in the /var/log/loadshed.log file on novadaq-far-master.fnal.gov. The timestamp should be within 5 minutes of the current time.

ssh -l novadaq novadaq-far-master.fnal.gov "tail /var/log/loadshed.log" 

Examples outputs look like:

Wed Jan 18 15:05:01 CST 2017: R4I: -84, R4E: -178, R9I: -186, R9E: -224
Tempsensors: W: 0, E: 0, C: 0, Z: 0, T: 4
Farm: W: 1, E: 1, C: 1, Z: 0, T: 128
Wed Jan 18 15:05:06 CST 2017: Current Errorlevel is: 0

Where the numbers in the first line show the difference between the current temperature and the baseline temperature in the unit of 0.01F. The baseline temperatures are

RACK_04_INTERNAL_BASELINE=6300
RACK_04_EXTERNAL_BASELINE=8000
RACK_09_INTERNAL_BASELINE=6300
RACK_09_EXTERNAL_BASELINE=7000

Hence, the first line can be interpreted as:

 
Wed Jan 18 15:05:01 CST 2017: Amount temps away nominal (degrees f): R4I: -2.64, R4E: -3.13, R9I: -3.34, R9E: -3.93

The second line and the third line represents the following:

    echo "Tempsensors: W: $NUM_WARNING, E: $NUM_ERROR, C: $NUM_CRITICAL, Z: $NUM_CRAZY, T: $TEMPSENSOR_COUNT" 
    echo "Farm: W: $FARM_NUM_WARNING, E: $FARM_NUM_ERROR, C: $FARM_NUM_CRITICAL, Z: $FARM_NUM_CRAZY, T: $NODE_COUNT" 

  • If you need to power off DAQ nodes immediately or the loadshed script is not running, you can power off FD DAQ nodes by following this page .*
  • Procedures to follow during a temperature incident are here*

Last update on 12/05/2016 Resource List

  • FarDet:
    Managers: ConfigurationManager, DDTManager, DataLogger, EventDispatcher4, GlobalTrigger, MessageAnalyzer, MessageFacilityServer, MessageViewer, RunControlServer, SNEWSMessage, SpillServer, TDUManager, TriggerScalrs4
    BNEVB NEW Lists (as of 2017-07-19): 10-45 excluding 12, 16, 30, 41 and 56-124 excluding 57. New as of Aug 4th 2017, exclude also 28
    Timing chains: ALL - DiB-{01-14}{s,t}

Last update on 09/09/2016 Current timing chain in use

FD Chain 2 and ND Chain 2. FD Chain 1 and ND Chain 3 are backup timing chains. ND Chain 1 (tdu-near-master-01) is where SpillServer runs.

h1. Effective from 08/04/2016

ssh multiplexing had been disabled per discussion during DAQ meeting on 08/04/2016. The flashing red boxes might be seen in DAQAppMgr. This is expected until the "network problem" is fixed.

h1. Effective from 07/21/2016

DDT filter processes fail to start on farm nodes 178 - 202.

Contact Pengfei Ding for questions

h2. Related ECL#88790

During the last few DAQ restart, ddt-filter processes were not started successfully by DDT manager. The progress bar in DDT manager kept oscillating at 77% and 78%.

You do not need to wait it to reach 100% before continuing with the run start steps in Run Control. However, you do need to wait for it before clicking "begin run".

In case of the progress bar showed no progress after two minutes when "begin run" in Run Control becomes available:
click "Abort" in DDT Manger;
Manually start process on farm nodes which are in pink status:
a. If one group of farm nodes are all in pink status, click the "Start Group" button;
b. If part of the farm nodes in the group are in pink and others are in green, right click on the individual farm nodes and select "Start Process".
click "begin run" after all boxes in DDT manager turn into green.