Data compression and zero suppression

Data compression and zero suppression is currently handled by functions in lardata's RawData package, in raw.cxx. The 35t detector simulation SimWireDUNE35t passes the raw adc vectors of shorts along with user-set fcl parameters to raw::Compress, an overloaded function that selects the method of data compression.

Compress function

The compress function accepts the raw adc vector along with the type of data compression requested, which can be zero suppression, Huffman coding, a combination of both (i.e. zero suppression followed by Huffman coding), or none, in which case the function returns the original adc vector. Compressed data can be uncompressed by the "Uncompress" function, which applies the appropriate decompressing routine to return the original adc vector. If zero suppression has been applied, the decompressed vector's entries below threshold will be filled by the pedestal value.

Zero suppression

The zero suppression algorithm scans over the raw adc vector of floats for values differing from the pedestal values specified by the pedestal parameter by an absolute value above the threshold specified by the parameter zerothreshold. Values differing from the pedestal by the threshold are saved in blocks where each block is preceded or followed by at least a number of values below threshold defined by the nearestneighbor parameter. Neighboring blocks are merged and the beginning and ending of the vectors are accounted for. The zero suppression can also save blocks on neighboring channels happening in parallel to the blocks around the signal values, up to a number of channels specified by the NeighboringChannels parameter in detsimmodules_dune.fcl; setting this to zero disables this functionality. The nearest neighboring channel zero suppression is performed by passing a circular buffer of adc vectors on either side of the primary adc vector being zero suppressed and checking if any values in the entire buffer for each time index are above threshold.

The boolean parameter fADCStickyCodeFeature determines if the zero suppression will take into account the 35t's adc sticky code hardware bug which causes adc values' 6 least significant bits to "stick" at 000000 or 111111 with some probability. If the parameter is set to true, the zero suppression will treat any adc value ending in 000000 or 111111 (in binary) as the equivalent of falling below threshold.

Zero suppressed data coding format

The zero suppressed adc vectors are arranged as follows:

  • First entry is the total number of entries in original adc vector
  • Second entry is the number of blocks of saved values above threshold (and some number of values before and after, determined by nearestneighbor parameter)
  • Original index of each block's first value
  • Total length of each block
  • Contents of each block

Huffman coding

The Huffman coding scheme is based on differences between adc values in adjacent time bins. Huffman coding was developed for MicroBooNE and is not used for most DUNE studies. The coding scheme is:

  • No change for 4 ticks -> 1
  • No change for 1 tick -> 01
  • +1 change -> 001
  • -1 change -> 0001
  • +2 change -> 00001
  • -2 change -> 000001
  • +3 change -> 0000001
  • -3 change -> 00000001
  • Absolute value of change > 3 -> write actual raw value to short

The 15th bit of each short is used to set whether a block is encoded or is a raw value, where 1 is Huffman coded and 0 is raw. The lowest bits of each word are padded out with zeros.

Online zero suppression

The 35-ton DAQ include ZS (zero suppression) to allow the possibility of continuous running. Here is a description of the algorithm used there.

See also the talk by JJ Russell at the September 2015 DUNE collaboration meeting. One change from that talk is that the algorithm is now based on running sums of the signals preceding a bin rather than on the value in the bin itself. There are new parameters specifying the number of ticks in the sum and a noise threshold on the tick signals contributing to each sum.

35-ton ZS algorithm

The online algorithm has seven parameters:
  • NS - Number of ticks included in the running sum
  • NL - Front porch for the signal
  • ND - Size of the dead band
  • NT - Back porch for the signal
  • TS - Threshold for the ADC count in a bin to be included in the running sum
  • TL - Threshold for declaring the start of a signal
  • TD - Threshold for declaring a channel is in the dead band

The algorithm processes each channel independently, sequentially processing the array of 12-bit ADC values in temporal order. For each tick, a running sum RS is evaluated over the preceding NS channels. The sum includes a channel count NRS and the ADC sum ARS. Channels with bit patterns indicating a stuck bit (LS six bits 000000 or 111111) are excluded from both the count and the ADC sum. Channels with signal at or below TS are included in the sum but excluded from the ADC sum.

Ticks remain outside a signal region until their the running sum crosses above threshold, specifically until

ARS(i) > NRS(i)*TL

The tick i where this condition is met is called ISTART.
Ticks are then considered to be in the signal region until they fall below the dead threshold

ARS(i) <= NRS(i)*TD

for ND consecutive ticks. The tick i where that condition is met is called ISTOP. The subsequent tick is outside the signal region and the search for the next signal begins.

ADC signals are retained for NL ticks preceding ISTART and NT after ISTOP, i.e. for [ISTART-NL, ISTOP+NT]. Signals that fall outside all such ranges are suppressed.