Project

General

Profile

Timing System Wiki

Useful technical note describing the Timing System and the scripts used in TDU delay calculation can be found here:
Timing System Tech Note

The Timing system as deployed and wired on the detector can be found here:
NDOS Timing Chain

Information on how to power cycle the Timing System can be found here:
Power Cycle NDoS Systems

Timing Chains

Each detector has two independent timing chains connected to it. One is the active timing chain, which we use to synchronize the detectors. The other is a backup chain, which is not actively used, but is physically connected to the detector elements and can be enabled at any time.

At the head of each timing chain is a Master TDU (MTDU). The mapping between timing chains and MTDUs for each detector is listed below:

FarDet

tdu-master-arm-01 -- Timing Chain 1
tdu-master-arm-02 -- Timing Chain 2
tdu-master-arm-03 -- Not physically connected to the detector

NearDet

tdu-near-master-arm-01 -- Spill Server (Not physically connected to the detector)
tdu-near-master-arm-02 -- Timing Chain 2
tdu-near-master-arm-03 -- Timing Chain 1

When to scrub TDUs

If the detector interlocks trip then the slave TDUs have lost power. The timing chains will need to be scrubbed. Any master TDU not physically connected to the detector by a timing chain can remain untouched (the spill server on the ND or tdu-master-03 on the FD).

If the MINOS Surface Building or FD computing room loses power then all MTDUs will be off. This requires a scrub of all master TDUs and timing chains.

If there is a suspected issue with the spill server, only the spill server TDU needs to be scrubbed (the run can be left going, the detector does not need to be in internal timing mode). When scrubbing the spill server first turn off the spill server auto-restart rule in the FD message analyzer. Remember to activate the rule after the spill server scrub. If a scrub of the spill server does not resolve the issue, consult a timing expert before proceeding to scrub the detector TDUs.

Timing Chain Recovery procedure

This section describes steps that can be taken to reset the timing system (scrub) on the Near or Far Detector. This procedure is necessary when an interlock is tripped or when the DCMs have bad timing links. It is also required in This procedure is FOR DAQ EXPERTS ONLY. Please read full section, there are some important notes.

Always scrub BOTH timing chains on the detector

Stop run and release resources before scrubbing a timing chain

Detector must be in internal timing mode before scrubbing

How to scrub a timing chain

  1. Stop the run and release resources (unless you are scrubbing TDU-01 on the NearDet or TDU-03 on the FarDet. These are not physically connected to the detector and won't disrupt a run). If you are scrubbing the Spill Server (TDU-01 on the NearDet) turn off the auto restart spill server rule in Message Analyzer. Remember to turn this back on when you are done scrubbing.
  2. SWITCH TO INTERNAL TIMING MODE(unless you are scrubbing TDU-01 on the NearDet or TDU-03 on the FarDet) : Press "DCM Internal Timing" icon on FD-01 or ND-01 and follow instructions. Do this twice to ensure this is completly done.
  3. Copy the TDU registers in the ECL as a comment on your Expert Activity form.
    From novadaq-{far,near}-master, SSH into the TDU in question as user root: ssh tdu{-near}-master-ppc-0{1,2,3} -l root and execute tduRegDump:
    [novadaq@novadaq-near-master ~]$ ssh tdu-near-master-ppc-01 -l root
    [root@tdu-near-master-ppc-01:~]$ tduRegDump
    ------------------------------------------------------
    Register        Value    Description
    ------------------------------------------------------
    0x0000        0x0400    Control
                  Bit-10: Enable Accelerator Events
    0x0001        0x0c48    Status
                  Bit-03: Okay to SYNC
                  Bit-06: No TCLK/MIBS
                  Bit-10: Reserved
                     .
                     .
                     .
    

    Copy the full output of the tduRegDump command into the ECL comment.
  4. Open a standalone instance of TDUControl. Do this by opening a terminal and doing the following:
    ssh novadaq@novadaq-{near,far}-master
    setup_online
    TDUControl -m
    
  5. Select and connect to an MTDU from the gui which appears. Choose "custom" in the side by drop-down menu and type name of the server (ex: tdu-near-master-arm-02) in the left upper corner of the TDUControl interface window.
  6. Select Config->Enable Expert Mode
    You should be on the Timing Control tab
    Have you switched to internal timing? DO NOT CONTINUE IF YOU HAVE NOT SWITCHED TO INTERNAL TIMING!!!
  7. Click on the "ALL" button next to the red label "Scrub and load stored delays from memory" (unless advised otherwise by the expert.)
  8. You will then be disconnected from the TDU during the scrub and will not be able to connect again till it is finished. This takes ~10 minutes.
  9. Wait 10 minutes, then connect via TDUControl again to verify the scrub is complete.
  10. Log into the TDU and copy the TDU registers into the ECL (see above).
  11. Verify that the status and error registers match the values specified in the table below for the TDU you're scrubbing:
    TDU ID       ROLE              STATUS      ERROR-1      ERROR-2
    =================================================================
    Near-01      Spill Server      0x0c48      0x4400       0x0
    Near-02      ND Chain 2        0x1c48      0x0          0x0
    Near-03      ND Chain 1        0x1008      0x0          0x0
    -----------------------------------------------------------------
    Far-01       FD Chain 1        0x1008      0x0          0x0
    Far-02       FD Chain 2        0x1008      0x0          0x0
    Far-03       FD Spare          0x1008      0x0          0x0
    ==================================================================
    

    There are some exceptions to these values:
    1. The Spill Server may show ERROR-1 of 0x0400. This is fine.
    2. The Spill Server may show ERROR-1 of 0x4000. This is fine during shutdowns.
    3. If the scrub was performed without loading delays from memory (non-standard) the status will read 0x0008.
    4. If the delays were then subsequently calculated by the TDU delay-learn the status would be 0x000A.
    5. For TDUs connected to the accelerator (Near-01, Near-02) during shutdowns, the primary register may read 0x0500. This is okay during beam shutdowns. When beam is up, this is a problem.
      If the TDU in question does not have the expected state as described above, repeat the scrub procedure. You should contact a timing expert if you are unsure how to proceed.
  12. Exit expert mode on TDUControl: Unselect Config->Enable Expert Mode.
  13. Connect to the MTDU from the gui.
  14. ISSUE A SYNC: Press the red SYNC button in the main non-expert TDUControl panel. Wait about 30 seconds.
  15. Position the mouse over the "Time of Last Sync" hex value, the UTC time will be displayed in the lowest left corner of TDUControl. Verify that this roughly matches up with the current UTC time.
  16. Verify again that no errors are present in the error register. Check against the table above. Repeat scrub if errors exist
  17. Disconnect from chain in TDUControl.
  18. Repeat steps 3-17 for the other timing chain. Perform timing chain scrubs sequentially not simultaneously.
  19. After scrubbing, the TCR webserver and the TDU web monitor need to be restarted.
    1. Log into novadaq-far-master-02 or novadaq-near-master as novadaq, setup_online and execute the following commands.
      DAQOperationsTools/bin/startTDUWeb-{near,far}.sh
      DAQOperationsTools/bin/startTCRMonitor-{near,far}.sh
      
    2. If the TDUs were power cycled, you will also have to remove the first memory segment that gets created by the web monitor. The default first segment has an id of 0, which doesn't play nicely with the monitoring. Logon to each TDU as root and do:
      ipcs (this just shows you the memory segments, if the shmid is 0, do the next line)
      ipcrm -m 0
      
      Following the above, you'll need to repeat the startTCRMonitor command (or use the icon) to start the web monitoring.
  20. Proceed to restart a run. The process of starting a run will switch DCMs back to external timing.
For a more detailed description on the timing system as a whole and the various procedures involved in calibrating the timing chain see the technote below. The steps above automatically recover timing delays. Consult with a timing expert and the document below if manual recovery to timing delays is required.

TDU connection details

For Far Detector

Must be on FD cluster (generally novadaq-far-master-02) to talk to the TDUs. The MTDUs can be connected to on the PPC side by ssh root@tdu-master-ppc-{01,02}. Master and slaves can be connected to through the ARM side in the TDUControl interface. The naming conventions for masters: tdu-master-arm-{01,02). For slaves: tdu-slave-aa-bb where aa is the timing chain (01,02) and bb is the diblock number (01-14). These can be selected from the pull down menu in the TDUControl interface.

For Near Detector
Must be on ND cluster (generally novadaq-near-master) to talk to the TDUs. The MTDUs can be connected to on the PPC side by ssh root@tdu-near-master-ppc-{01,02}. Master and slaves can be connected to through
the ARM side in the TDUControl interface. The naming convention for masters: tdu-near-master-arm-{01,02}. For slaves: tdu-near-slave-aa-bb where aa is the timing chain (01,02) and bb is the diblock number (01-04). NOTE: currently the ND tdus are not supported in the pull down menu of the TDUControl interface. The names described above must instead be entered into the server field to the left of the pull-down menu (which must be set to Custom) to connect to the unit.

Steps for manually loading timing delays

If something has gone wrong with the automatic loading of timing delay values they may need to be loaded manually. Follow the below instructions, consult with a timing expert first.

  1. Disconnect TDUDelay monitor in the FD-01 or ND-01 vnc session if it is monitoring a chain that will be calibrated.
  2. Log on as novadaq to novadaq-far-master-02 or novadaq-near-master
  3. setup_online
  4. Scrub the timing chain (described above) This is Required.
  5. The TDUControl must be disconnected from any units in the timing chain that will be reset
  6. Launch script tdu_load_delay_from_database.sh (which lives in the TDUUtilities package).
  7. When prompted select the detector from window
  8. Select the timing chain (1 or 2)
  9. Select which slave TDUs to load delays in (default is all of them). NOTE: The master TDU (MTDU) will always have its delay set in this procedure. To set the delay on a single slave without effecting the MTDU, see below)
  10. Select the global configuration to read the delays from in the database
  11. The program then runs, printing text as it sets each delay value (ignore the Qt errors, they are not relevant). Messages will print indicating which unit is being set to what value. This can take ~30 seconds as each delay requires 4 registers to be set.
  12. Message will pop up indicating calibration is done.
  13. Reconnect the TDUDelay monitor to the chain that was just calibrated. Initial connection can take up to 30 seconds. If done properly, all delay values in the monitor should appear green. If there are problems, call a timing expert. Monitor must be checked, if something is not green consult with timing experts before continuing. Check both chains in monitor (must disconnect from one chain before selecting and connecting to the other).
  14. IMPORTANT NOTE 2: In current version, the script does not go back and clear errors on the MTDU when it is done. After the calibration you will see a SERDES error (0x4000) in the MTDU. Errors on the TDU latch, so this does not mean there is a problem currently. The error came when the clock was interrupted in the calibration. To manually clear the error see instructions below.

Manually clearing TDU errors

Through the TDUControl interface(master or slave):
  1. Open TDUControl interface (if one is not already open, can launch with TDUControl -m once the online environment is setup).
  2. Select MTDU to connect to (for FD select TDU-Master-ARM-(01,02) from the tab, for ND type in tdu-near-master-arm-(01,02) in the space on left, press connect
  3. Select Config->Enable Expert Mode
  4. Select the Expert Controls tab (found next to Timing Control)
  5. From pull down menu select "Errors"
  6. Make sure value field is 0x0000
  7. Press set button.
  8. This should clear the SERDES Error (and others). If errors persist, call expert. NOTE an ND TDU may have a MIBs Parity Error if it is connected to the Accelerator to time stamp spills. This error is OK.

Connecting to a TDU

Far Detector

For the far detector you can use the latest version of TDUControl to connect to all of the TDUs that are currently installed at the far site. If you are on site you can simply choose an appropriate unit from the drop down menus and hit the connect button.

If you are offsite, then you will need to connect through the tunnel/proxy feature that TDUControl provides. To use this, make sure that you have a valid kerberos ticket. In the config menu select the proxy/tunnel config option and enter the appropriate information for the relay (typically you can use novadaq-far-master.fnal.gov as the base relay). Make sure that the account that you are using for the tunnel is one that you have permissions for in the k5login file (i.e. you can normally login to it). Select the check box to enable tunneling.

When you go to connect an ssh based tunnel will be created and the connection to the far site will be directed through it. When you disconnect the tunnel will be torn down.

Notes: Auto detection of open local ports is not yet implemented so you may have to change the local port to match one that is not in use (if there are other people proxying through tunnels too).

TDU Consoles (Far Det)

The master TDUs have their console ports attached to novadaq-far-master-02 with USB cables. This makes it possible to access the consoles via the linux terminal devices /dev/ttyUSB0, /dev/ttyUSB1 ... /dev/ttyUSBxxx

To connect to these consoles, login to master-02. Then as root start up a screen session. There is a configuration file for screen that attempts to map all the right ports to the right devices. Typing "screen" (start a brand new screen session) or "screen -r" (resume a detached session) or "screen -x /root" (if you are sharing a session with another person). Once connected you can cycle through the different consoles with the standard gnu screen commands (ctrl-a n for next terminal, ctrl-a p for previous terminal, etc...).

When there is a power outage, the TDU PPC boards will typically reboot faster than the servers they rely on for their boot files. This causes a race condition where they are unable to boot and will "hang" in their boot loader. If this happens, you will be able to see their console (hit enter a few times if nothing is on the screen) and will get a prompt that looks something like:

redboot >

At this point type "reset" to reset the tdu and continue the reboot process.

You should be able to watch the system boot from the console.

Near Detector

Currently there are six timing distribution units which are deployed for NOvA. Four of the units are "master" systems which include a GPS receiver and can decode accelerator signals. The other two are "slave" units and are essentially repeaters for the commands and clocks that are issued by the masters.

Any TDU (master or slave) can be connected to using a special network port that is built into the devices and gives the user access to the ARM (a type of micro processor) portion of the TDU. To do this a small utility called "TDUControl" has been developed, and provides the user with some basic functionality for working with the system. This interface is available from most DAQ and control room machines.

Start the application from the command line using:

TDUControl

Alternatively if you wish to start the application AND allow the DAQ's Runcontrol to pass information and automated sync signals to the TDU, then start the client as:

TDUControl -r

To also "auto connect" to one of the TDUs at the detector use either:
TDUControl -r --primary
TDUControl -r --secondary

The "Primary" TDU is located in the NDOS building, the "Secondary" is located in the MINOS surface building. Switching between them for sync operations requires moving the primary sync line for the detector from the primary master's output to the seconary's slave output. i.e. the topology looks like:

As of 22Aug2011, the TDU that was in NDOS has been moved to the Test Stand for use in debugging GPS related issues. A second TDU is however installed in the Minos surface building and can be used as a backup to run the detector in the event that the other TDU in Minos fails.

(removed) TDU-master (Primary NDOS)    --> Detector DCM chain
TDU-master (Secondary MINOS) --> TDU-slave (Secondary NDOS) --> detector DCM chain
TDU-master (Teriary MINOS)   === TDU-slave (Secondary NDOS) --> detector DCM chain

The cable to move is the one that leads to the DCM chain. In practice this involves moving the yellow cable from the output of the master, down to the output of the slave which is right below it (move it to either the "top out" or "side out", "chain out" will also work but is intended for connecting additional TDU slaves)

Once the program is started:

If you selected an auto connect, the program will connect to the TDU and read the firmware version that it finds. Not all firmware versions are equal and if there is any incompatibility it will be noted in a popup box.

Firewalls and Tunneling

The control software uses standard TCP sockets to transmit data. As a result it can be tunneled through a firewall to allow clients to work transparently with TDU hardware that is behind a firewall or on the private data networks.

Tunneling Through the Far Detector

Spill Server monitor

In the spill server monitor if the NSSTDUApp is not running (it is pink) click the desktop icon spill server backbone restart so that the process can be restarted. Images are below


Power On and Configuration of a TDU

When a TDU starts under v1.10 firmware the GPS clock is NOT automatically started. After power cycling a TDU you must manually start the clock through the following procedure:

1. Connect to the TDU and activate the "Expert Mode"
2. On the expert control tab, select the "Static Control" register
3. Enter the value 0x0020 into the "value" box under the register selection box.
4. Click the "Set" button to the right of the box.
5. Wait for a normal update cycle (approx 5-10s) and the value to the right of "Time of Last Sync" should now ready something very close to the current time.

Manual configuration

To manually start the TDU time registers (without using the graphical interface) you will need to log directly into the PowerPC side of the TDU. In order start the clock on the master TDU you will need to both "enable count" and send a sync command.

  • Count is enabled by bit 2 (value 0x0002) in register 0.
  • Sync is sent by bit 5 (value 0x0020) in register 0.
  • These can be combined since the count enable is atomic by writing 0x0022 to register 0.
    tduControl set 0x0 0x22
    

This should start the TDU GPS linked clocks.

Auto Delay Calibration

The auto delay calibration system is accessed through the TDU control registers. This is done by either logging in through the TDUControl interface to the MASTER tdu's ARM board or by logging into the PowerPC side of the master. Once logged in you need to write a value of 0x0010 to the primary control register (address 0x0). This will initiate the auto delay calibration sequence.

The sequence will then indicate that it is busy by raising bit 1 of the status register (address 0x0001) when the delay is done being calculated it will then raise bit 2 of the status register. This should be checked for completion before continuing.

The general sequence should look like:

  1. Login to the master tdu for the chain
  2. Start the calibration
    1. Write 0x0010 to 0x0000
    2. Read 0x0001 and verify that bit 1 is high ( i.e. readback & 0x0001 true)
  3. Send a sync pulse by writing 0x0200 to address 0x0.
    1. Continue to poll 0x0001 until ((readback & 0x0001) false) AND ((readback & 0x0002) == true )
  4. Verify that the calibration values make sense.
    1. Read Register 0x0002, 0x0003, 0x0004
    2. Convert the values to nano seconds. Each count is 7.8125ns.
    3. Excessively large values indicate that the loopback on that string was not in place or a component was turned off (i.e. a dcm was down so that it couldn't bass the pulses down and back on its chain)

Other Operations

You will see an interface that looks like:

You can now connect to one of the standard timing units using the drop down menu and clicking the "connect" button:

Note: Currently the "Near Detector Master", and Teststand Master and Slaves are the only ones operational. The other listings are place holders for future deployment.

If the TDU you want to connect to is not in the list you can enter either it's IP address or full hostname in the custom server box.

Once you are connected the interface will communicate with the TDU and retrieve basic information about it.

To disconnect, click the disconnect button.

Note: The TDU only permits a single control interface to be connected to it at a time. If you are unable to connect, chances are it is because someone else is using the TDU and has the connection tied up

Synchronizing the System

Under the current version of the DAQ system three methods of detector synchronization are possible. These modes are "manual" sync (i.e. pushing the big red button), auto-sync (the TDU is resync'd on a regular basis without user interaction), and RunControl initiated at start of run or on subsequent DAQ Message.

Suspending the IOC during Sync

Currently there is an incompatibility between the way that the DCM switches the timing lines between the internal and external modes. This causes a situation where when the IOC is running it can interrupt the timing command line and cause commands and sync signals to be lost. To work around this the following procedure is implemented.

On the DCMs we run two pseudo servers that respond to broadcast UDP packets. When the servers receive a packet they execute the command that either suspends or restarts the IOC process. The commands to start these pseudo servers are:

./socat udp4-recvfrom:9101,broadcast,fork exec:"/home/anorman/killscript.sh dcmIOC STOP" 

and
./socat udp4-recvfrom:9102,broadcast,fork exec:"/home/anorman/killscript.sh dcmIOC CONT" 

To activate one of the servers a client sends a broadcast packet to the appropriate port.

echo "stop" | socat stdio udp4-datagram:192.168.139.255:9101,broadcast,range=192.168.139.0/2
..... issue timing commands .....
echo "go"   | socat stdio udp4-datagram:192.168.139.255:9102,broadcast,range=192.168.139.0/2

This work around requires the installation of the "socat" compiling socat utilities on both the DCMs and the machine that you are broadcasting the packet from. See compiling socat for x86 and powerpc for details on the build process. Alternatively you can generate a UDP broadcast in any of the normal fashions (i.e. Qt network layer) and achieve the same functionality.

TDUControl can be configured and enabled to perform these types of broadcasts prior to and after major sync operations.

Run Control based Syncs

To permit Run Control to communicate with the TDU, the user need only start the TDUControl client with a connection to RC using the "-r" option:
TDUControl -r

Or connect/reconnect to the run control system via the "Config" menu. Under this menu you will see three options:
  • Start RunControl Listen
  • Stop RunControl Listen
  • Restart RunControl Listen

These options are fairly self explanatory. If for some reason you find that RunControl is unable to talk to the TDU, simply restart the run control listen thread.

Note: The interaction with run control is not turned on by default in order to permit standalone control and debugging with the TDUControl client.

Manual Sync and Control

On the "Timing Controls" tab of the interface there are different buttons which can send commands to the entire detector. These commands have the following effect:

  • Sync to Zero: resets all clocks to zero (TDU slaves, DCMs, FEBs)
  • Sync to Current Time: sets all clocks to the current GPS time (TDU slaves, DCMs, FEBs)
  • Start DAQ: sends a "start" command to FEBs
  • Stop DAQ: sends a "stop" command to FEBs
  • Start Time: sends a "start count" command to FEBs
  • Stop Time: sends a "stop count" command to FEBs
  • Load GPS Time: pre-loads the current GPS time into all systems (but does not issue the final sync or start commands)

    In normal operation, if you need to manually re-sync the detector...click the big red "Sync Detector to Current Time" button.

Quick Reference

My TDU thinks it's the year 2525...

This happens when the firmware booting and GPS locking aren't triggered in a certain order during a cold start of the system. To correct this, follow the following proceedure.

  1. Connect to the TDU in question using the TDUControl interface.
  2. Check to see what the TDU thinks the current time is. The current GPS time is displayed in the lower right hand corner of the interface:

    This time is updated every few seconds, and should match the date and time you think it is (note: time is in GMT not CST or CDT).
  3. Flip to the "Firmware" tab of the interface and hit the following buttons in order:
    1. "Boot FPGA" (this will internally reset the FPGA which handles the timing system)
    2. "Reload GPS" (this will resync the TDU to the GPS and setup the current time)

      A this point the TDU should be resynced to the GPS system and his times should be correct. Let him update the current time and see that it is now correct.

TDU Firmware

There are two types of firmware that are programmed into the TDU. The first type affects the ARM processor and its interactions with the LCD screen, USB/Serial console port, XPort Ethernet interface and the primary FPGA that is installed on the TDU main board. This firmware also provides a command interface for access to the address/register range that is published by the firmware that is loaded into the primary FPGA.

The second type of firmware controls the TDU's timing ports, the decoding of accelerator signals, interactions with the GPS receiver, and publishes an address/register map for configuring and accessing these devices.

Upgrading the ARM software

The Arm Software can only be upgraded via a direct JTAG interface with the ARM board. To upgrade this software please contact either Greg Deuerling or Neal Wilcer.

Upgrading the FPGA firmware

There are two general methods for upgrading the FPGA firmware. Both involve rewriting the flash memory on the TDU. Most users will use the TDUControl interface to update the firmware. This interface uses the ARM's XPort Ethernet interface to push pages to the flash memory. The interface handles the correct buffering and pushing of the firmware image to the devices. To use this interface do the following:

  1. Start TDUControl
  2. Switch to Expert Mode (from the config menu)
  3. Connect to the TDU you wish to flash
  4. Select the "firmware" tab
  5. Unlock the flash memory using the "Unlock FPGA" button
  6. Erase the flash memory using the "Erase FPGA" button (this can take a minute)
  7. Browse for the firmware image you wish to load. Firmware files are located in /export/tdu/usr/firmware.
    (Note that in a test of the 9/27/11 version of TDUControl, the Browse button brought TDUControl down, but entering the full file pathname of the firmware in the text field, e.g. /export/tdu/usr/firmware/TDU_Master_V100_10Forked.rbf worked.)
  8. Open the firmware file using the "Open File" button (this buffers the firmware into memory and verifies it)
  9. Download the firmware to the TDU using the "Download Firmware" button.

The download process will display a progress bar on the screen and give a running update on the console of what page the program is currently downloading.

The download/flash process will take 5-10minutes. If the network connection is interrupted, start the process over again. There is no way to efficiently resume a failed transfer.

When the download process is done:

  1. Lock the FPGA using the "Lock FPGA" button
  2. Reload the FPGA firmware and reboot it for the changes to take affect (use the "Boot FPGA" button)
  3. Reload the GPS the using the "Reload GPS" button

The TDU should now be updated and ready to go.

If this process fails and leaves the TDU in a very bad state (i.e. one where you can not even connect to it) you will need to reflash the device using the JTAG interface. Contact Neal Wilcer and Greg Deuerling to do this.

Firmware Versions and Documentation

Each firmware version is documented in a set of release notes and a firmware guide which documents all the functionality and configuration parameters for each version. These documents can be found on NOVA-DocDB.

The command set for interacting with the ARM processor over the XPort or console can be found at:

TDU stuck sending sync

If there are no 3D tracks in the detector or data is not flowing, one of the potential reasons is that the TDU failed to transmit a time sync and could be stuck. To verify and recover from this follow the following steps:

  1. Check the state of the control register of the MTDU. This can be done in two ways. From the TDU control interface in the second column of information in the upper part of display there is a field that says "Control" with a hex number next to it. Check and see if bit 5 is set (0x0020). If that bit is set the TDU thinks it is sending a sync, which is why new syncs will not work. Alternatively, steps above can be followed to connect to the ppc side of the TDU and then "tduControl get 0x0" reads the control register. If the sync bit is not set, then the TDU is not stuck sending a sync, there is another problem and a timing expert should be called.
  2. Next check the status register. In the TDUControl interface this is the "Status" field right under Control. Or from the ppc side "tduControl get 0x1". You want to check and see if bit 15 is set (0x8000). If bit 15 is set then the timing command line is active (the TDU thinks it is in the middle of sending timing information). The only way to unstick this is to scrub the system, follow the above procedures for a timing chain recovery.
  3. In this nice case, the control register just needs to be reset. Write 0x0400 to the control register. From the ppc side "tduControl set 0x0 0x0400". Or from the TDUControl interface select Config->Enable Expert Mode. Then select the "Expert Controls" tab. Select "Static Control" from the pull-down list. Then type 0x0400 and press the set button. After a couple seconds the control register should reflect this change. Now that the register is unstuck, issue a sync again and the time should update. If this does not work then it is time to call a timing expert.

Time Transfer (Soudan to Ash River)

Soudan to Ash River Time Transfer Instructions and Notes

TCR Monitoring

The TCR (Timing Calibration Reference) is connected to each of the TDUs at Ash River. This unit produces a reference pulse on the GPS 1 second boundary that each of the DCMs timestamps. The deviation of the time stamps from the known 1 second boundary gives an indication of the accuracy with which the TDUs are initialized.

The application TCRMonitor runs on each system that has a TCR connected to it. This application readouts out the information and produces log messages. These messages are logged to:

/daqlogs/NovaSpillServer/TCRMonitor

The TCRMonitor applications can be started in several ways. The easiest is through the use of the desktop icon. On screen 1 of each detector there is a desktop icon called "Restart TCR Monitors". Double click this and a dialog box will appear asking you if you really want to restart the TCR Monitors. Select YES to proceed. This will only affect the TCR Monitors on the given detector, so be aware of which machine you're using.

The other way to restart the TCR Monitors is with the startTCRMonitor-{far,near}.sh script run from the novadaq account on novadaq-far-master or novadaq-near-master:

On novadaq-far-master

> startTCRMonitor-far.sh

On novadaq-near-master

> startTCRMonitor-near.sh

After you've restarted the TCR Monitors, you should ensure that they started successfully. There are several ways to do this:
1. Refresh the TDU webpage. The TDU status at the top of the page should be GREEN for all TDUs listed.
2. Check that a new log file for each TCRMonitor instance (1 for each non-spill-server TDU) has been created and is updating. Do this by logging into novadaq-{near,far}-master and looking in the /daqlogs/NovaSpillServer/TCRMonitor/ directory.
3. Log into the non-spill-server TDUs and check whether the process is running:

> ps aux | grep TCRMonitor

TDU Web Monitoring

Info

To check the current state of each TDU and its TCR reference:
[[http://novadaq-far-master.fnal.gov/tdu_status.html]]

This page will show the proper status of the system if run from the control room. To see the proper status when running outside the control room the following proxy information must be put in your proxy.pac file:

function FindProxyForURL(url, host){
    // First resolve the host down to its ip address
    var resolved_ip = dnsResolve(host);

    // Check if the ipaddress is on the FNAL campus network
    if(
       isInNet(resolved_ip,"131.225.0.0","255.255.0.0")
    ){
      return "DIRECT";
    }

    // Check if the ipaddress is on the Ash River daq subnet
    if(
        isInNet(resolved_ip, "192.168.136.0", "255.255.252.0") ||
        isInNet(resolved_ip, "198.124.68.0",  "255.255.255.0")
      ){
        return "PROXY novadaq-far-master-02.fnal.gov:9000";
    } 

    return "DIRECT";
}

Troubleshooting

The web monitoring can be launched from novadaq-near-master and novadaq-far-master in the novadaq account with:

> startTDUWeb-{near,far}.sh

There are several error modes that can be encountered on the TDU webpage. Perhaps the most common is one in which the TDU status is defunct. This means that the TCR Monitor is not running on the TDU in question. To resolve, simply restart the TCR Monitors for that detector:

> startTCRMonitor-{near,far}.sh

For most other problems, it's usually enough to simply restart the web servers. Use the web monitoring launch command above.

In most cases, these two steps will solve the problem, but in some situations, a timing chain scrub will be necessary. This is usually the case when the web servers and the TCR Monitor have been restarted and the "More than 1m off" error message is still present. However, a scrub should be issued as a last resort, especially when there is beam. One should consult with a timing expert if possible before issuing a scrub to solve this problem.

Power-cycling TDUs remotely

Power-cycling Near Detector TDUs

You should only need to power-cycle the master TDUs.

  • For TDU masters on the ground in MINOS Surface Building
    1. ssh to novadaq-near-master
    2. open a browser to 172.30.17.216 or telnet to it.
    3. User name: apc, Password: Same as for VNC session.
    4. you can then control each power outlet there (power on/off immediately)
  • For underground slave TDUs,
    1. ssh to novadaq-near-master
    2. open a browser to 172.30.16.230 or telnet to it.
    3. User name: apc, Password: Same as for VNC session.
    4. you can then control each power outlet there (power on/off immediately).

Power-cycling Far Detector TDUs

  • For TDU masters:
    1. ssh to novadaq-far-master
    2. open a browser to 192.168.136.242 or telnet to it.
    3. use "apc" for both user name and PW to login.
    4. you can then control each power outlet there (power on/off immediately).
      • The steps to power on/off are:
        1 for Device Manager
        3 to check power status, ESC
        2 for OutletManagement
        1 for Outlet Control/Configuration
        10 for selecting outlet of USBHUB-TDU123-TCR
        1 for Control of the selected outlet
        2 to turn off, 1 to turn back on
        

Let Pengfei Ding () know if you run into any problems when power-cycling remotely.

Switching Timing Chains

First, you must change the database configuration:

1. Locate DAQConfigEditor on novacr01

2. Click on the DCM Hardware tab, and the timing_system_settings subtab

3. Change the column 'port' to read '0' for timing chain 1, and '1' for timing chain 2

These tabs, and this column, are illustrated in the picture below:

Then, log on to novadaq-far-master-02, and change the file /nova/config/FarDet/Partition1/daq-operations.cfg . If you are looking to enable timing chain 2, it should read:

MASTER_TDU_HOST TDU-Master-ARM-02

If you are looking to enable timing chain 1, the number at the end of this line should read 01.

If you have not already stopped the run, you must now stop the run, and release resources. From there, reserve resources and restart the run as usual.

Problems? Please email Pengfei Ding ().