Project

General

Profile

ExpertBulletinBoard » History » Version 68

« Previous - Version 68/165 (diff) - Next » - Current version
Louise Suter, 10/31/2017 12:42 PM


ExpertBulletinBoard

Current Running Conditions Temporary Conditions New & Permanent Conditions Pay Special Attention

Please make sure that you are familiar with the SHIFT Bulletin Board.

Current release and config:

We are running DAQ R12 on both FD and ND!!!
FD config: FarDetGlobalConfigP1_TriggerStabilityTest
ND config: NearDetGlobalConfigP1_SNtest_MapMaker

Permanent: Instructions for fixing OnMon "shared memory" problems

We've been having shared memory problems with the ND OnMon (for unknown reasons.) The symptoms are the producer and viewer both show an error upon start up that says "IPC.cxx:42 Can't find shared memory segment" (or something like that.) Plus the viewer is typically frozen. You should first try running the kill-all-onmon script, but if that doesn't work, you'll have to clean up the SHM manually. Instructions are here

If an expert connects to one of the VNCs remotely and the window resizes, when they are done, the instructions for resizing the VNC window are here

Permanent: Instructions for resizing a control room VNC

If an expert connects to one of the VNCs remotely and the window resizes, when they are done, the instructions for resizing the VNC window are here

Permanent: No terminals open on ND CR-05 (ND run control) ("new" Oct. 2017)

Please make sure there are no terminals windows open on this machine unless you or another expert is currently doing work or you have been instructed by someone to leave it open. This includes checking to make sure windows are not minimized. Simply keep typing "exit" until the windows disappear. Having a terminal window open prevents the KillPartition script from working.

Update on 10/06/2017 We have switched to R12!

Note that the instructions for switching to R12 (and rolling back if anything happens) can be found here

Update on 10/06/2017 ND kill partition script.

The kill partition script for the ND was found not to work (and did not kill things running on the expert desktop.) An ECL entry is here To make it work, the shifter ran the script manually and then had to manually kill things running on the expert desktop.

Update on 01/17/2017 High Temperature Event

  • Procedures to follow during a temperature incident are here*

Please check if the loadshed script is running by checking the timestamp of the last log entry in the /var/log/loadshed.log file on novadaq-far-master.fnal.gov. The timestamp should be within 5 minutes of the current time.

ssh -l novadaq novadaq-far-master.fnal.gov "tail /var/log/loadshed.log" 

Examples outputs look like:

Wed Jan 18 15:05:01 CST 2017: R4I: W, R4E: X, R9I: Y, R9E: Z
Tempsensors: W: 0, E: 0, C: 0, Z: 0, T: 4
Farm: W: 1, E: 1, C: 1, Z: 0, T: 128
Wed Jan 18 15:05:06 CST 2017: Current Errorlevel is: 0

Where the numbers in the first line show the difference between the current temperature and the baseline temperature in the unit of 0.01F. The baseline temperatures are

RACK_04_INTERNAL_BASELINE=6300
RACK_04_EXTERNAL_BASELINE=8000
RACK_09_INTERNAL_BASELINE=6300
RACK_09_EXTERNAL_BASELINE=7000

Hence, the first line can be interpreted as:

 
Wed Jan 18 15:05:01 CST 2017: Amount temps away nominal (degrees f): R4I: -2.64, R4E: -3.13, R9I: -3.34, R9E: -3.93

The second line and the third line represents the following:

    echo "Tempsensors: W: $NUM_WARNING, E: $NUM_ERROR, C: $NUM_CRITICAL, Z: $NUM_CRAZY, T: $TEMPSENSOR_COUNT" 
    echo "Farm: W: $FARM_NUM_WARNING, E: $FARM_NUM_ERROR, C: $FARM_NUM_CRITICAL, Z: $FARM_NUM_CRAZY, T: $NODE_COUNT" 

  • If you need to power off DAQ nodes immediately or the loadshed script is not running, you can power off FD DAQ nodes by following this page .*

Last update on 08/04/2017 Resource List

  • FarDet:
    Managers: ConfigurationManager, DAQApplicationManager, DataLogger, EventDispatcher4, GlobalTrigger, MessageAnalyzer, MessageFacilityServer, MessageViewer, RunControlServer, SNEWSMessage, SpillServer, TDUManager, TriggerScalrs4
    BNEVB NEW Lists (as of 2017-10-31): 10-46 excluding 12, 16, 28, 30, 34, 36, 41, 46, and 56-196 excluding 57, 88, 113, 168.
    Timing chains: ALL - DiB-{01-14}{s,t}

Last update on 09/09/2016 Current timing chain in use

FD Chain 2 and ND Chain 2. FD Chain 1 and ND Chain 3 are backup timing chains. ND Chain 1 (tdu-near-master-01) is where SpillServer runs.

h1. Effective from 08/04/2016

ssh multiplexing had been disabled per discussion during DAQ meeting on 08/04/2016. The flashing red boxes might be seen in DAQAppMgr. This is expected until the "network problem" is fixed.

h1. Effective from 07/21/2016

DDT filter processes fail to start on farm nodes 178 - 202.

Contact Pengfei Ding for questions

h2. * Related ECL#88790*

During the last few DAQ restart, ddt-filter processes were not started successfully by DDT manager. The progress bar in DDT manager kept oscillating at 77% and 78%.

You do not need to wait it to reach 100% before continuing with the run start steps in Run Control. However, you do need to wait for it before clicking "begin run".

In case of the progress bar showed no progress after two minutes when "begin run" in Run Control becomes available:
click "Abort" in DDT Manger;
Manually start process on farm nodes which are in pink status:
a. If one group of farm nodes are all in pink status, click the "Start Group" button;
b. If part of the farm nodes in the group are in pink and others are in green, right click on the individual farm nodes and select "Start Process".
click "begin run" after all boxes in DDT manager turn into green.