Project

General

Profile

Bug #9672

artdaq systems that use art v1_14_03+ seem to have difficulties producing readable data products

Added by Kurt Biery over 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
Navigation
Target version:
Start date:
07/20/2015
Due date:
% Done:

100%

Estimated time:
Spent time:
Occurs In:
Scope:
Internal
Experiment:
DarkSide
SSI Package:
art
Duration:

Description

We recently released a new version of artdaq (v1_12_11) that included the ability to use art v1_14_03 (s12) and v1_15_01 (s14).

In tests of this new artdaq in ds50daq at the WH14NE teststand (with Huffman compression turned off), I've found that the data files that are written with s12 and s14 do not seem to have any DS-50 data products in them. This problem does not appear when using s11 (art v1_14_02). When this problem occurs, the fragment and event sizes that are reported at various points in the artdaq system seem to be correct, so I wonder if the issue is simply with the writing of the disk file.

When I've run s12/s14 tests in which the Huffman compression is enabled in the EventBuilders, the online monitoring Aggregator complains that it can't find the compressed V1720 data when it tries to decompress it for use in making histograms. In these tests, the disk files also seem to be empty.

Associated revisions

History

#1 Updated by Kurt Biery over 4 years ago

To reproduce the problem with the artdaq-demo, the following steps can be used:
  1. log into "woof" or other similar development environment (that has art v1_14_03 installed)
  2. create a directory to hold the artdaq-demo code, etc. ("mkdir <workDir>")
  3. "cd <workDir>"
  4. "git clone http://cdcvs.fnal.gov//projects/artdaq-demo"
  5. "cd artdaq-demo"
  6. "git checkout feature/art1.14.03test"
  7. "cd tools"
  8. "./quick-start.sh --tag 'feature/art1.14.03test'" (answer yes to the are-you-sure question)
  9. wait for the code checkouts and builds to complete
  1. once the code is built, start with two fresh shells...
  2. in each of them, run "source <workDir>/setupARTDAQDEMO"
  3. in the first shell, run "start2x2x2System.sh"
  4. in the second shell, run "manage2x2x2System.sh init"
    • then "manage2x2x2System.sh -N 101 start"
    • wait a minute...
    • then "manage2x2x2System.sh stop"
  5. shut down the DAQ processes by typing <ctrl-c> in the first shell (to kill the start2x2x2System.sh command)
  1. in one of the two shell windows, "cd /tmp"
  2. then look for the latest root file with run number 101 in the name, owned by you, that doesn't have "mod" in the name
  3. run "rawEventDump -n 2 <data file>"

#2 Updated by Kyle Knoepfel over 4 years ago

I have followed your instructions. Upon attempting to open each file I see:

%MSG-s ArtException:  art 21-Jul-2015 08:35:46 CDT JobSetup
cet::exception caught in art
---- FileOpenError BEGIN
  ---- FatalRootError BEGIN
    Fatal Root Error: @SUB=TFile::ReadBuffer
    error reading all requested bytes from file RootOutput-e011-93ee-1a23-9a46.root, got 276 of 300
  ---- FatalRootError END

  RootInputFileSequence::initFile(): Input file RootOutput-e011-93ee-1a23-9a46.root was not found or could not be opened.
---- FileOpenError END
%MSG

Is this one of the errors you were seeing as well?

#3 Updated by Kyle Knoepfel over 4 years ago

Ahem. Ignore the previous note -- operator error. I've opened the root file you've instructed me to, and I confirm that no products have been written to them. I will test the same code with 1.14.02.

#4 Updated by Kyle Knoepfel over 4 years ago

  • Category set to Navigation
  • Status changed from New to Assigned
  • Assignee set to Kyle Knoepfel
  • % Done changed from 0 to 50
  • SSI Package art added
  • SSI Package deleted ()

We understand the source of this error. When switching from art v1_14_02 to v1_14_03, the product presence information was reshuffled so as to avoid accessing stale memory. One of the art::ProductRegistryHelper functions was adjusted in a way that unintentionally created problems for artdaq. The fix appears to be relatively straightforward, and it looks like it can be implemented within art itself.

#5 Updated by Kyle Knoepfel over 4 years ago

  • % Done changed from 50 to 90

I have updated the relevant function in art to appropriately include the product presence information. With this fix, the following TBranchElement objects are now present in the ROOT file:

art::TriggerResults_TriggerResults__DAQ.
artdaq::Fragments_daq_ASCII_DAQ.
artdaq::Fragments_daq_TOY1_DAQ.
artdaq::Fragments_daq_TOY2_DAQ.
artdaq::Fragments_daq_V1720_DAQ.
artdaq::Fragments_daq_V1724_DAQ.
artdaq::Fragments_daq_missed_DAQ.
artdaq::Fragments_daq_unidentified_DAQ.
art::TriggerResults_TriggerResults__DAQAG.

I see the same set of branches for the file produced with art v1.14.02. More specifically, the *TOY{1,2}* branches have products whose presence values are non-null. Fix will be committed to the repository by Chris Green.

#6 Updated by Kyle Knoepfel over 4 years ago

  • Status changed from Assigned to Feedback
  • % Done changed from 90 to 100

A fix has been committed to the art 1.14 and 1.15 branches (art:0667489af993114f3018d2c31ebf5e84f20b5e63, and art:ee4809da0b577dae268ac73f764ae6646427ec3e).

What are your requirements for needing a new release with the fix?

Also, on an unrelated note, I occasionally ran into some double-free errors when typing <cntl-c>, per your instructions above. Here's an example of the error I saw:

[2015-07-22 09:19:54] INFO  WEBrick::HTTPServer#start done.
Wed Jul 22 09:19:54 -0500 2015: Signal of Class Signal of Class 2 received.  Exiting
Wed Jul 22 09:19:54 -0500 2015: [mpiexec@woof.fnal.gov] Sending Ctrl-C to processes as requested
Wed Jul 22 09:19:54 -0500 2015: 2 received.  Exiting
Wed Jul 22 09:19:54 -0500 2015: [mpiexec@woof.fnal.gov] Press Ctrl-C again to force abort
Wed Jul 22 09:19:55 -0500 2015: Signal of Class 15 received.  ExitingSignal of Class 15 received.  Exiting
Wed Jul 22 09:19:55 -0500 2015: Signal of Class Signal of Class 1515 received.  Exiting
Wed Jul 22 09:19:55 -0500 2015:  received.  Exiting
Wed Jul 22 09:19:55 -0500 2015: *** glibc detected *** BoardReaderMain: double free or corruption (!prev): 0x0000000001f85ea0 ***
Wed Jul 22 09:19:55 -0500 2015: ======= Backtrace: =========
Wed Jul 22 09:19:55 -0500 2015: /lib64/libc.so.6[0x336d275e66]
Wed Jul 22 09:19:55 -0500 2015: /lib64/libc.so.6[0x336d2789b3]
Wed Jul 22 09:19:55 -0500 2015: /home/knoepfel/products/root/v5_34_30/Linux64bit+2.6-2.12-e7-prof/lib/libCore.so(_ZN4ROOT17TGenericClassInfoD1Ev+0x20)[0x7fea725d9ef0]
Wed Jul 22 09:19:55 -0500 2015: /lib64/libc.so.6(__cxa_finalize+0x9d)[0x336d235ebd]
Wed Jul 22 09:19:55 -0500 2015: /home/knoepfel/products/root/v5_34_30/Linux64bit+2.6-2.12-e7-prof/lib/libHist.so(+0x11e746)[0x7fea769b6746]
Wed Jul 22 09:19:55 -0500 2015: ======= Memory map: ========
[ etc. ]

Would be worth checking the garbage collection.

#7 Updated by Kurt Biery over 4 years ago

We don't have any particular needs, that I know of, regarding the next art release that would include this fix. It could be 1.15.02. And, I don't believe that we need a release in the 1.14.xx series with this fix.

#8 Updated by Kyle Knoepfel over 4 years ago

  • Status changed from Feedback to Resolved

We will list this issue as resolved. The fix will be included in the next release of art.

#9 Updated by Kyle Knoepfel over 4 years ago

  • Target version set to 1.15.02

#10 Updated by Christopher Green about 4 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF