Support #20426

Daily call with Giovanna and Karol

Added by Kurt Biery over 2 years ago. Updated over 2 years ago.

Work in progress
Target version:
Start date:
Due date:
% Done:


Estimated time:


#1 Updated by Kurt Biery over 2 years ago

Notes from 23-July-2018:
  • the highest priority is the creation of the snapshot software area so that shifters and detector tests will have a stable area to use. (This was completed by the end of the day on Monday.)
  • there were reports of events not arriving to the Online Monitoring. (This was not confirmed. For other reasons, the experiment is moving to running online monitoring from disk files, rather than a real-time data feed.)

#2 Updated by Kurt Biery over 2 years ago

Notes from 24-July-2018
  • Ron and I connected to the usual Zoom number, but Giovanna nor Karol connected. They were likely busy deploying the newest timing system firmware. If I see one of them on the pdunedaq slack channel later today, I'll ask about any issues.

#3 Updated by Kurt Biery over 2 years ago

Notes from 25-July-2018, Giovanna, Kurt, Ron, Wes
  • ongoing work at protoDUNE - checking for noise from the cryostat; installing the new timing system (firmware?)
    • only one unit of timing is working at the moment, "fanout0" which is being use on Partition 1
  • Giovanna created a couple of new standard configurations (eliminated DataLoggers, zero-padding of file index)
  • She asked for assistance in tracking down an issue with the zeroMQ timing out (need to check exact config param name)
    • it's not clear whether this parameter is being book-kept correctly since they see cross-talk of trigger messages coming from the timing BRs in multiple partitions
  • She tried one run stop/start, and that didn't work because of an RCE problem.
  • I asked about another snapshot release - hopefully not soon, but the currently one will become obsolete once the new timing system is operational. Giovanna thought that maybe a patch to that snapshot would be appropriate, rather than cutting a whole new snapshot.
  • We touched briefly on the email from Patrick regarding rates. Wes pointed out that the bursty behavior for a single EB is likely because the RoutingMaster is using a first-come, first-served routing policy. We talked a bit about variations to that, which Wes is thinking about.
  • We talked about the current plan for running Online Monitoring from disk files, rather than a real-time feed. The analysis of the data (if it includes TPC data) is so slow that the extra latency added by running from disk files is hardly noticeable (my words).

#4 Updated by Kurt Biery over 2 years ago

Notes from 26-July-2018, Wes, Kurt, Ron
  • Giovanna was too busy trying to get (non-artdaq) things to work to call in
  • We wondered if there is a FHiCL-to-JSON converter artdaq_database package. This could be useful in instances in which experiments (e.g. the protoDUNE Trigger System) want to use JSON, but we want them to use FHiCL.
    • at the moment, the Trigger folks are storing a JSON blob in their FHiCL configuration file and using the jsoncpp third-party package to parse that blob in their code
  • Wes will look into the 'ZeroMQ trigger message crosstalk between partitions' problem that was mentioned yesterday.
  • We still need to reply to Enrico about the desired order of sending commands to artdaq process. I will send a tentative reply, but I feel like we need to talk about this in the group.
  • Wes and I talked about a different way of transporting routing information between processes - an in-memory database on each computer that has super-fast replication between the nodes. This would be extremely useful for monitoring, even if it doesn't turn out to be fast enough for routing. I sent an email to Giles Barr at Oxford to mention this as a possible area of collaboration between UK and Fermilab folks on DUNE DAQ back-end software. I'll also file an Idea Issue.

#5 Updated by Kurt Biery over 2 years ago

Notes from 27-July-2018; Wes, Ron, Karol
  • no urgent items, further integration of trigger BR going on (mostly getting DIM metric monitoring going now), general desire to stress/push to higher rates (currently getting ~6 Hz with 3 APAs [half detector]).
  • If doing testing with hardware, best to use Rack Side components, and likely for artdaq testing better to focus on larger system (so 3 APAs) rather than smaller (just 1 APA), though hardware guys doing some tests with that

#6 Updated by Kurt Biery over 2 years ago

Notes from 30-July-2018; Wes, Ron, Kurt
  • neither Giovanna nor Karol called in
  • we discussed travel to CERN (John in late August), plan for InhibitMaster (incorporate into ToySimulators, artdaq proper?), handling of backpressure, stop/start, and 1st start.
  • Ron with work on a list of suggested changes from the develop branches of artdaq packages

#7 Updated by Kurt Biery over 2 years ago

Notes from 31-July-2018; Giovanna, Wes, Kurt
  • Giovanna asked about progress in understanding the rate limitations. We explained our plan for tackling bugs first with the selected changes from the develop branch and looking at rate limitations next.
    • That said, it's true that the EB crashes that we see at 8+ Hz at protoDUNE might be lessened with various bug fixes
  • Giovanna suggested mapping out the phase space of number of RCEs, number of EBs, and trigger rate to see which system configurations are stable
  • Partition 0 will now always have the trigger
  • Manuel has restored the fake trigger rate input in the RC screens for Partitions 1-5
  • Wes asked about an indicator on the RC GUI when the InhibitMaster is inhibiting triggers - not yet
  • GLM reports noise on some of the APAs. She is planning to take two runs per day, their morning and afternoon, for noise assessment
  • We mentioned our plan to try to get archival configuration information back in the art/ROOT file. GLM asked if this might 'just work' now that the individual FHiCL documents are smaller. We should check, but we should also look into splitting the whole wad up into smaller pieces.
    • Giovanna asked about saving space by not duplicating identical configurations.

#8 Updated by Kurt Biery over 2 years ago

  • Status changed from New to Work in progress
Notes from 01-Aug-2018; Eric, Wes, Ron, Kurt
  • neither Giovanna nor Karol called in
  • I presented some preliminary findings on the number of EventBuilders needed to handle the rate with various numbers of RCEs and various trigger rates at protoDUNE. Wes suggested that I use that absence of trigger inhibits to determine good system configurations, and I will do that. (I've used a second-order indicator so far - stable data sizes at the EventBuilder.)
  • Ron reported that his protoDUNE+ version of artdaq code (the protoDUNE branch plus selected changes from the develop branch) exhibits a mis-behavior that neither the protoDUNE not develop branches do. After some discussion on the best way forward, we agreed to create a branch from a recent version on the develop branch and focus on testing that. (Issue #20500)

Also available in: Atom PDF