Project

General

Profile

Wiki » History » Version 47

« Previous - Version 47/97 (diff) - Next » - Current version
Hayes Merritt, 05/19/2015 10:12 AM


Timing System Wiki

Useful technical note describing the Timing System and the scripts used in TDU delay calculation can be found here:
Timing System Tech Note

The Timing system as deployed and wired on the detector can be found here:
NDOS Timing Chain

Information on how to power cycle the Timing System can be found here:
Power Cycle NDoS Systems

Connecting to a TDU

Far Detector

For the far detector you can use the latest version of TDUControl to connect to all of the TDUs that are currently installed at the far site. If you are on site you can simply choose an appropriate unit from the drop down menus and hit the connect button.

If you are offsite, then you will need to connect through the tunnel/proxy feature that TDUControl provides. To use this, make sure that you have a valid kerberos ticket. In the config menu select the proxy/tunnel config option and enter the appropriate information for the relay (typically you can use novadaq-far-master.fnal.gov as the base relay). Make sure that the account that you are using for the tunnel is one that you have permissions for in the k5login file (i.e. you can normally login to it). Select the check box to enable tunneling.

When you go to connect an ssh based tunnel will be created and the connection to the far site will be directed through it. When you disconnect the tunnel will be torn down.

Notes: Auto detection of open local ports is not yet implemented so you may have to change the local port to match one that is not in use (if there are other people proxying through tunnels too).

TDU Consoles (Far Det)

The master TDUs have their console ports attached to novadaq-far-master-02 with USB cables. This makes it possible to access the consoles via the linux terminal devices /dev/ttyUSB0, /dev/ttyUSB1 ... /dev/ttyUSBxxx

To connect to these consoles, login to master-02. Then as root start up a screen session. There is a configuration file for screen that attempts to map all the right ports to the right devices. Typing "screen" (start a brand new screen session) or "screen -r" (resume a detached session) or "screen -x /root" (if you are sharing a session with another person). Once connected you can cycle through the different consoles with the standard gnu screen commands (ctrl-a n for next terminal, ctrl-a p for previous terminal, etc...).

When there is a power outage, the TDU PPC boards will typically reboot faster than the servers they rely on for their boot files. This causes a race condition where they are unable to boot and will "hang" in their boot loader. If this happens, you will be able to see their console (hit enter a few times if nothing is on the screen) and will get a prompt that looks something like:

redboot >

At this point type "reset" to reset the tdu and continue the reboot process.

You should be able to watch the system boot from the console.

Near Detector

Currently there are six timing distribution units which are deployed for NOvA. Four of the units are "master" systems which include a GPS receiver and can decode accelerator signals. The other two are "slave" units and are essentially repeaters for the commands and clocks that are issued by the masters.

Any TDU (master or slave) can be connected to using a special network port that is built into the devices and gives the user access to the ARM (a type of micro processor) portion of the TDU. To do this a small utility called "TDUControl" has been developed, and provides the user with some basic functionality for working with the system. This interface is available from most DAQ and control room machines.

Start the application from the command line using:

TDUControl

Alternatively if you wish to start the application AND allow the DAQ's Runcontrol to pass information and automated sync signals to the TDU, then start the client as:

TDUControl -r

To also "auto connect" to one of the TDUs at the detector use either:
TDUControl -r --primary
TDUControl -r --secondary

The "Primary" TDU is located in the NDOS building, the "Secondary" is located in the MINOS surface building. Switching between them for sync operations requires moving the primary sync line for the detector from the primary master's output to the seconary's slave output. i.e. the topology looks like:

As of 22Aug2011, the TDU that was in NDOS has been moved to the Test Stand for use in debugging GPS related issues. A second TDU is however installed in the Minos surface building and can be used as a backup to run the detector in the event that the other TDU in Minos fails.

(removed) TDU-master (Primary NDOS)    --> Detector DCM chain
TDU-master (Secondary MINOS) --> TDU-slave (Secondary NDOS) --> detector DCM chain
TDU-master (Teriary MINOS)   === TDU-slave (Secondary NDOS) --> detector DCM chain

The cable to move is the one that leads to the DCM chain. In practice this involves moving the yellow cable from the output of the master, down to the output of the slave which is right below it (move it to either the "top out" or "side out", "chain out" will also work but is intended for connecting additional TDU slaves)

Once the program is started:

If you selected an auto connect, the program will connect to the TDU and read the firmware version that it finds. Not all firmware versions are equal and if there is any incompatibility it will be noted in a popup box.

Firewalls and Tunneling

The control software uses standard TCP sockets to transmit data. As a result it can be tunneled through a firewall to allow clients to work transparently with TDU hardware that is behind a firewall or on the private data networks.

Tunneling Through the Far Detector

Power On and Configuration of a TDU

When a TDU starts under v1.10 firmware the GPS clock is NOT automatically started. After power cycling a TDU you must manually start the clock through the following procedure:

1. Connect to the TDU and activate the "Expert Mode"
2. On the expert control tab, select the "Static Control" register
3. Enter the value 0x0020 into the "value" box under the register selection box.
4. Click the "Set" button to the right of the box.
5. Wait for a normal update cycle (approx 5-10s) and the value to the right of "Time of Last Sync" should now ready something very close to the current time.

Manual configuration

To manually start the TDU time registers (without using the graphical interface) you will need to log directly into the PowerPC side of the TDU. In order start the clock on the master TDU you will need to both "enable count" and send a sync command.

  • Count is enabled by bit 2 (value 0x0002) in register 0.
  • Sync is sent by bit 5 (value 0x0020) in register 0.
  • These can be combined since the count enable is atomic by writing 0x0022 to register 0.
    tduControl set 0x0 0x22
    

This should start the TDU GPS linked clocks.

Auto Delay Calibration

The auto delay calibration system is accessed through the TDU control registers. This is done by either logging in through the TDUControl interface to the MASTER tdu's ARM board or by logging into the PowerPC side of the master. Once logged in you need to write a value of 0x0010 to the primary control register (address 0x0). This will initiate the auto delay calibration sequence.

The sequence will then indicate that it is busy by raising bit 1 of the status register (address 0x0001) when the delay is done being calculated it will then raise bit 2 of the status register. This should be checked for completion before continuing.

The general sequence should look like:

  1. Login to the master tdu for the chain
  2. Start the calibration
    1. Write 0x0010 to 0x0000
    2. Read 0x0001 and verify that bit 1 is high ( i.e. readback & 0x0001 true)
  3. Send a sync pulse by writing 0x0200 to address 0x0.
    1. Continue to poll 0x0001 until ((readback & 0x0001) false) AND ((readback & 0x0002) == true )
  4. Verify that the calibration values make sense.
    1. Read Register 0x0002, 0x0003, 0x0004
    2. Convert the values to nano seconds. Each count is 7.8125ns.
    3. Excessively large values indicate that the loopback on that string was not in place or a component was turned off (i.e. a dcm was down so that it couldn't bass the pulses down and back on its chain)

Other Operations

You will see an interface that looks like:

You can now connect to one of the standard timing units using the drop down menu and clicking the "connect" button:

Note: Currently the "Near Detector Master", and Teststand Master and Slaves are the only ones operational. The other listings are place holders for future deployment.

If the TDU you want to connect to is not in the list you can enter either it's IP address or full hostname in the custom server box.

Once you are connected the interface will communicate with the TDU and retrieve basic information about it.

To disconnect, click the disconnect button.

Note: The TDU only permits a single control interface to be connected to it at a time. If you are unable to connect, chances are it is because someone else is using the TDU and has the connection tied up

Synchronizing the System

Under the current version of the DAQ system three methods of detector synchronization are possible. These modes are "manual" sync (i.e. pushing the big red button), auto-sync (the TDU is resync'd on a regular basis without user interaction), and RunControl initiated at start of run or on subsequent DAQ Message.

Suspending the IOC during Sync

Currently there is an incompatibility between the way that the DCM switches the timing lines between the internal and external modes. This causes a situation where when the IOC is running it can interrupt the timing command line and cause commands and sync signals to be lost. To work around this the following procedure is implemented.

On the DCMs we run two pseudo servers that respond to broadcast UDP packets. When the servers receive a packet they execute the command that either suspends or restarts the IOC process. The commands to start these pseudo servers are:

./socat udp4-recvfrom:9101,broadcast,fork exec:"/home/anorman/killscript.sh dcmIOC STOP" 

and
./socat udp4-recvfrom:9102,broadcast,fork exec:"/home/anorman/killscript.sh dcmIOC CONT" 

To activate one of the servers a client sends a broadcast packet to the appropriate port.

echo "stop" | socat stdio udp4-datagram:192.168.139.255:9101,broadcast,range=192.168.139.0/2
..... issue timing commands .....
echo "go"   | socat stdio udp4-datagram:192.168.139.255:9102,broadcast,range=192.168.139.0/2

This work around requires the installation of the "socat" compiling socat utilities on both the DCMs and the machine that you are broadcasting the packet from. See compiling socat for x86 and powerpc for details on the build process. Alternatively you can generate a UDP broadcast in any of the normal fashions (i.e. Qt network layer) and achieve the same functionality.

TDUControl can be configured and enabled to perform these types of broadcasts prior to and after major sync operations.

Run Control based Syncs

To permit Run Control to communicate with the TDU, the user need only start the TDUControl client with a connection to RC using the "-r" option:
TDUControl -r

Or connect/reconnect to the run control system via the "Config" menu. Under this menu you will see three options:
  • Start RunControl Listen
  • Stop RunControl Listen
  • Restart RunControl Listen

These options are fairly self explanatory. If for some reason you find that RunControl is unable to talk to the TDU, simply restart the run control listen thread.

Note: The interaction with run control is not turned on by default in order to permit standalone control and debugging with the TDUControl client.

Manual Sync and Control

On the "Timing Controls" tab of the interface there are different buttons which can send commands to the entire detector. These commands have the following effect:

  • Sync to Zero: resets all clocks to zero (TDU slaves, DCMs, FEBs)
  • Sync to Current Time: sets all clocks to the current GPS time (TDU slaves, DCMs, FEBs)
  • Start DAQ: sends a "start" command to FEBs
  • Stop DAQ: sends a "stop" command to FEBs
  • Start Time: sends a "start count" command to FEBs
  • Stop Time: sends a "stop count" command to FEBs
  • Load GPS Time: pre-loads the current GPS time into all systems (but does not issue the final sync or start commands)

    In normal operation, if you need to manually re-sync the detector...click the big red "Sync Detector to Current Time" button.

Quick Reference

My TDU thinks it's the year 2525...

This happens when the firmware booting and GPS locking aren't triggered in a certain order during a cold start of the system. To correct this, follow the following proceedure.

  1. Connect to the TDU in question using the TDUControl interface.
  2. Check to see what the TDU thinks the current time is. The current GPS time is displayed in the lower right hand corner of the interface:

    This time is updated every few seconds, and should match the date and time you think it is (note: time is in GMT not CST or CDT).
  3. Flip to the "Firmware" tab of the interface and hit the following buttons in order:
    1. "Boot FPGA" (this will internally reset the FPGA which handles the timing system)
    2. "Reload GPS" (this will resync the TDU to the GPS and setup the current time)

      A this point the TDU should be resynced to the GPS system and his times should be correct. Let him update the current time and see that it is now correct.

TDU Firmware

There are two types of firmware that are programmed into the TDU. The first type affects the ARM processor and its interactions with the LCD screen, USB/Serial console port, XPort Ethernet interface and the primary FPGA that is installed on the TDU main board. This firmware also provides a command interface for access to the address/register range that is published by the firmware that is loaded into the primary FPGA.

The second type of firmware controls the TDU's timing ports, the decoding of accelerator signals, interactions with the GPS receiver, and publishes an address/register map for configuring and accessing these devices.

Upgrading the ARM software

The Arm Software can only be upgraded via a direct JTAG interface with the ARM board. To upgrade this software please contact either Greg Deuerling or Neal Wilcer.

Upgrading the FPGA firmware

There are two general methods for upgrading the FPGA firmware. Both involve rewriting the flash memory on the TDU. Most users will use the TDUControl interface to update the firmware. This interface uses the ARM's XPort Ethernet interface to push pages to the flash memory. The interface handles the correct buffering and pushing of the firmware image to the devices. To use this interface do the following:

  1. Start TDUControl
  2. Switch to Expert Mode (from the config menu)
  3. Connect to the TDU you wish to flash
  4. Select the "firmware" tab
  5. Unlock the flash memory using the "Unlock FPGA" button
  6. Erase the flash memory using the "Erase FPGA" button (this can take a minute)
  7. Browse for the firmware image you wish to load. Firmware files are located in /export/tdu/usr/firmware.
    (Note that in a test of the 9/27/11 version of TDUControl, the Browse button brought TDUControl down, but entering the full file pathname of the firmware in the text field, e.g. /export/tdu/usr/firmware/TDU_Master_V100_10Forked.rbf worked.)
  8. Open the firmware file using the "Open File" button (this buffers the firmware into memory and verifies it)
  9. Download the firmware to the TDU using the "Download Firmware" button.

The download process will display a progress bar on the screen and give a running update on the console of what page the program is currently downloading.

The download/flash process will take 5-10minutes. If the network connection is interrupted, start the process over again. There is no way to efficiently resume a failed transfer.

When the download process is done:

  1. Lock the FPGA using the "Lock FPGA" button
  2. Reload the FPGA firmware and reboot it for the changes to take affect (use the "Boot FPGA" button)
  3. Reload the GPS the using the "Reload GPS" button

The TDU should now be updated and ready to go.

If this process fails and leaves the TDU in a very bad state (i.e. one where you can not even connect to it) you will need to reflash the device using the JTAG interface. Contact Neal Wilcer and Greg Deuerling to do this.

Firmware Versions and Documentation

Each firmware version is documented in a set of release notes and a firmware guide which documents all the functionality and configuration parameters for each version. These documents can be found on NOVA-DocDB.

The command set for interacting with the ARM processor over the XPort or console can be found at:

Water Leak Recovery procedure

This section describes steps that can be taken to reset the TDU timing delays after a water leak or interlock trip on the Near or Far Detector. This procedure is FOR DAQ EXPERTS ONLY. Please read full section, there are some important notes.

For a more detailed description on the timing system as a whole and the various procedures involved in calibrating the timing chain see the technote below. The rest of this wiki focuses on the steps outlined in section 6.

IMPORTANT NOTE 1: Setting the timing delay on a TDU interrupts the clock for up to 1 full second. If there are APDs running cold on a DCM connected to that TDU, interrupting the clock stops the TECCs. When the clock returns, cooling starts again at once and this leads to an over-current, turning off the DCM. When recovering the timing system, make sure the DCMs are warm.

TDU connection details

For Far Detector
Must be on FD cluster (generally novadaq-far-master-02) to talk to the TDUs. The MTDUs can be connected to on the PPC side by ssh root@tdu-master-ppc-{01,02}. Master and slaves can be connected to through the ARM side in the TDUControl interface. The naming conventions for masters: tdu-master-arm-{01,02). For slaves: tdu-slave-aa-bb where aa is the timing chain (01,02) and bb is the diblock number (01-14). These can be selected from the pull down menu in the TDUControl interface.

For Near Detector
Must be on ND cluster (generally novadaq-near-master) to talk to the TDUs. The MTDUs can be connected to on the PPC side by ssh root@tdu-near-master-ppc-{01,02}. Master and slaves can be connected to through
the ARM side in the TDUControl interface. The naming convention for masters: tdu-near-master-arm-{01,02}. For slaves: tdu-near-slave-aa-bb where aa is the timing chain (01,02) and bb is the diblock number (01-04). NOTE: currently the ND tdus are not supported in the pull down menu of the TDUControl interface. The names described above must instead be entered into the server field to the left of the pull-down menu (which must be set to Custom) to connect to the unit.

Steps for timing recovery after a water leak

After any kind of trip of the interlock, the effected racks lose power. This turns off and clears the error registers in both TDU chains. The procedure below should be applied to both chains.

  1. Disconnect TDUDelay monitor in the FD-01 or ND-01 vnc session if it is monitoring a chain that will be calibrated.
  2. Log on as novadaq to novadaq-far-master-02 or novadaq-near-master
  3. setup_online
  4. Scrub the timing chain (instructions in section below) This is Required.
  5. The TDUControl must be disconnected from any units in the timing chain that will be reset
  6. Launch script tdu_load_delay_from_database.sh (which lives in the TDUUtilities package).
  7. When prompted select the detector from window
  8. Select the timing chain (1 or 2)
  9. Select which slave TDUs to load delays in (default is all of them). NOTE: The master TDU (MTDU) will always have its delay set in this procedure. To set the delay on a single slave without effecting the MTDU, see below)
  10. Select the global configuration to read the delays from in the database
  11. The program then runs, printing text as it sets each delay value (ignore the Qt errors, they are not relevant). Messages will print indicating which unit is being set to what value. This can take ~30 seconds as each delay requires 4 registers to be set.
  12. Message will pop up indicating calibration is done.
  13. Reconnect the TDUDelay monitor to the chain that was just calibrated. Initial connection can take up to 30 seconds. If done properly, all delay values in the monitor should appear green. If there are problems, call a timing expert. Monitor must be checked, if something is not green consult with timing experts before continuing. Check both chains in monitor (must disconnect from one chain before selecting and connecting to the other).
  14. IMPORTANT NOTE 2: In current version, the script does not go back and clear errors on the MTDU when it is done. After the calibration you will see a SERDES error (0x4000) in the MTDU. Errors on the TDU latch, so this does not mean there is a problem currently. The error came when the clock was interrupted in the calibration. To manually clear the error see instructions below.

Manually clearing TDU errors

Through the TDUControl interface(master or slave):
  1. Open TDUControl interface (if one is not already open, can launch with TDUControl -m once the online environment is setup).
  2. Select MTDU to connect to (for FD select TDU-Master-ARM-(01,02) from the tab, for ND type in tdu-near-master-arm-(01,02) in the space on left, press connect
  3. Select Config->Enable Expert Mode
  4. Select the Expert Controls tab (found next to Timing Control)
  5. From pull down menu select "Errors"
  6. Make sure value field is 0x0000
  7. Press set button.
  8. This should clear the SERDES Error (and others). If errors persist, call expert. NOTE an ND TDU may have a MIBs Parity Error if it is connected to the Accelerator to time stamp spills. This error is OK.

IMPORTANT NOTE 3: If the error remains after trying to clear it, scrub the timing chain and then re-apply the delays. Scrubbing a timing chain also needs to be done with warm DCMs.

Through the PPC side(master only):
  1. From novadaq-far-master-02 ssh root@tdu-master-ppc-{01,02} (or from novadaq-near-master tdu-near-master-ppc-{01,02}) Note, only the master TDUs can be connected to in this way.
  2. setup_online --opt --xcompile (The options are required)
  3. The status of registers can be viewed with tduRegDump (displays all registers)
  4. To clear errors, "tduControl set 0x14 0x0"
  5. logout

How to scrub a timing chain

Through the TDUControl interface:
  1. Open TDUControl interface
  2. Select and connect to an MTDU
  3. Select Config->Enable Expert Mode
  4. Select the Expert Control tab
  5. From pull down menu select "Dynamic Control"
  6. Enter 0x004 in the field (this scrubs the MTDU and the slaves in the chain)
  7. Press set.
  8. You will then be disconnected from the TDU during the scrub and will not be able to connect again till it is finished. This takes ~5 minutes.
  9. After waiting 5 minutes connect again to verify the scrub is complete.
  10. Disconnect again and proceed back to the instructions for a water leak recovery.
Through the PPC interface:
  1. From novadaq-far-master-02 ssh root@tdu-master-ppc-{01,02} (or from novadaq-near-master tdu-near-master-ppc-{01,02}) Note, only the master TDUs can be connected to in this way.
  2. setup_online --opt --xcompile (The options are required)
  3. The status of registers can be viewed with tduRegDump (displays all registers)
  4. To scrub, "tduControl set 0x9 0x4" (This is the equivalent to the instructions above for the control interface.)
  5. IMPORTANT NOTE 4: Scrubbing though the PPC side does not automatically lock you out of the TDU as the ARM side does. PLEASE WAIT 5 MINUTES BEFORE ISSUING OTHER COMMANDS SO AS NOT TO INTERFERE WITH THE SCRUBBING PROCESS.

Manually setting TDU delay value for one SLAVE (this procedure is not the same for a Master)

  1. From novadaq-near-master or novadaq-far-master-02 cd into /daqlogs/${DetId}/TDUDelayCalibrationConstants/TimingChain{1,2}/
  2. In here are date stamped files for the TDU (top, side, and total) and DCM delays. Open the TDU files with a recent timestamp and note the delay values (these values are in 128 MHz clock ticks)
  3. Convert delay values to hex (many online calculators available)
  4. Connect to TDU control interface (details above)
  5. Connect to the desired slave unit (for FD select from the list, for ND slaves are tdu-near-slave-{01,02}-{01,02})
  6. Select config->Enable Expert Mode and then the Expert Controls tab
  7. From tab select "TDU Delay" register
  8. type in hex value from the "total" delay file, press set (can press read to read the value back, also at top of interface the Delay value should change)
  9. Select tab for "Side DCM Delay"
  10. Type in hex value from the "side" delay file, press set
  11. Select tab for "Top DCM Delay"
  12. Type in hex value for the "top" delay file, press set
  13. Select tab for "Static Control"
  14. Type in 0x2000, press set NOTE: This step is critical. This step reprograms the timing circuit with the register values written above, the delay is not set without it.
    #Disconnect from TDU, check TDU delay monitor to verify delay value is correct. Contact a timing expert with any problems.

TDU stuck sending sync

If there are no 3D tracks in the detector or data is not flowing, one of the potential reasons is that the TDU failed to transmit a time sync and could be stuck. To verify and recover from this follow the following steps:

  1. Check the state of the control register of the MTDU. This can be done in two ways. From the TDU control interface in the second column of information in the upper part of display there is a field that says "Control" with a hex number next to it. Check and see if bit 5 is set (0x0020). If that bit is set the TDU thinks it is sending a sync, which is why new syncs will not work. Alternatively, steps above can be followed to connect to the ppc side of the TDU and then "tduControl get 0x0" reads the control register. If the sync bit is not set, then the TDU is not stuck sending a sync, there is another problem and a timing expert should be called.
  2. Next check the status register. In the TDUControl interface this is the "Status" field right under Control. Or from the ppc side "tduControl get 0x1". You want to check and see if bit 15 is set (0x8000). If bit 15 is set then the timing command line is active (the TDU thinks it is in the middle of sending timing information). The only way to unstick this is to scrub the system and then reset the timing delays. This means stopping the run, warming the detector, and then following the scrub and delay reset instructions above. If bit 15 is not set, the recovery can be done simply and without warming the detector, proceed to next step.
  3. In this nice case, the control register just needs to be reset. Write 0x0400 to the control register. From the ppc side "tduControl set 0x0 0x0400". Or from the TDUControl interface select Config->Enable Expert Mode. Then select the "Expert Controls" tab. Select "Static Control" from the pull-down list. Then type 0x0400 and press the set button. After a couple seconds the control register should reflect this change. Now that the register is unstuck, issue a sync again and the time should update. If this does not work then it is time to call a timing expert.

Time Transfer (Soudan to Ash River)

Soudan to Ash River Time Transfer Instructions and Notes

TCRMonitor Log Files

  • We currently have TCRMonitor running on all FarDet TDUs. The output is logged and is located in /daqlogs/NovaSpillServer/TCRMonitor/ on far-master-02.

Power-cycling TDUs remotely

Power-cycling Near Detector TDUs

  • For underground slave TDUs,
    1. ssh to novadaq-near-master
    2. open a browser to 172.30.16.230 or telnet to it.
    3. use user name "apc" and PW for VNC session to login.
    4. you can then control each power outlet there (power on/off immediately).
  • For TDU masters on the ground in MINOS Surface Building
    1. ssh to novadaq-ctrl-master
    2. open a browser to 192.168.25.126 or telnet to it.
    3. use user name "apc" and PW for VNC session to login.
    4. you can then control each power outlet there (power on/off immediately)

Let Pengfei Ding () know if you run into any problems when power-cycling remotely.