Project

General

Profile

Bug #13272

Support #13174: Some issues have been found with the RootOutput automatic file closing changes

Spurious output files saved in artdaq context

Added by Kyle Knoepfel about 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
I/O
Target version:
Start date:
07/18/2016
Due date:
% Done:

100%

Estimated time:
0.00 h
Spent time:
Occurs In:
Scope:
Internal
Experiment:
-
SSI Package:
art
Duration:

Description

When moving to art 2.01.00 and forcing a file switch on the SubRun or Run boundary, a spurious output file is created.

History

#1 Updated by Kyle Knoepfel about 3 years ago

  • Estimated time set to 7.00 h

I have been able to run your example on woof, and I verify the spurious output files. In addition, it appears that the sequence of signals that the state-machine reacts to is:

$ manage2x2x2System.sh init
$ manage2x2x2System.sh -N 1 start

  InputFile
  Run
  SubRun
  Event
  Event
  .
  .
  .
  Event

$ manage2x2x2System.sh stop

  Run <-------- triggers expected file close
  SubRun
  Event
  InputFile <-- triggers spurious file close

Is this signal emission what you would expect/hope for for artdaq? I'd like to run the similar process using pre-art 2.1. Which artdaq-demo version should I use? And any adjustments necessary to the generateAggregationMain.rb script?

#2 Updated by Kyle Knoepfel about 3 years ago

  • Estimated time deleted (7.00 h)

#3 Updated by Kurt Biery about 3 years ago

Hi Kyle,
I don't believe that we have enough knowledge to say whether a particular sequence of signals is desired or not. From the perspective of a user of the system, I would suspect that the current set of signals is not desirable, given that it produces the spurious data file. However, I could imagine that there is a reason for the InputFile signal to be generated to meet other needs.

The sequence of a Run signal, a SubRun signal, and an Event signal at artdaq "Stop" time surprises me a little - I would naively have expected them to be generated in the reverse order, but may that is just my lack of knowledge.

As for trying the system with pre-2.1 art, we could use the same artdaq-demo version, but build it with e9, s21. Should I create a new directory on mu2edaq01 with that?
Kurt

#4 Updated by Kyle Knoepfel about 3 years ago

Thanks, Kurt. No need to create another directory--I can set things up on woof now that I know how.

#5 Updated by Kyle Knoepfel about 3 years ago

  • Status changed from Assigned to Feedback
  • Estimated time set to 0.00 h

I now understand what is happening.

First, since it has been determined that the appropriate fileSwitch.boundary value is Event, for which spurious files are not produced, no action may be required. As has been discussed (Issue #13273), a couple new configuration parameters will be added to facilitate the desired behavior of switching on (Sub)Run boundaries as well as (e.g.) specifying a maximum number of events.

However, since it is possible that this type of problem could be encountered with the new configuration parameters as well, it is worth discussing what is happening. Whenever the 'manage2x2x2System.sh stop' command is executed, the following happens in the NetMonInput source:

  1. The received msg_type_code is 3, signifying an EndSubRun.
  2. NetMonInput::readAndConstructPrincipal points the outR, outSR, and outE Principal pointers to principals newly created with flush values.
  3. NetMonInput::readNext returns true.

This sequence of operations may be just fine if the semantic of stop is to "pause" the system. If so, then either the art state machine/Source template could require adjustment so that the correct sequence of signals are emitted. If the semantic of stop is to "shutdown" the system, then the NetMonInput source should be adjusted.

I would like to hold off on adjustments to the state machine at the moment. Those may be come necessary for other reasons once the other aspects of the parent issue (#13174) are resolved.

The up-shot: The problem is sufficiently understood. If the semantics of stop are indeed to "pause" the system, then artdaq likely needs to make no changes, and any changes that might be required would be on the art side. However, since this only arises when forcing output-file closures on a Run boundary, which has been determined is not appropriate for artdaq, the most reasonable way forward seems to be to avoid such a configuration. We can further consider state-machine adjustments whenever the other output-file closing issues are addressed. Does that sound reasonable?

#6 Updated by Kurt Biery about 3 years ago

It sounds quite reasonable to avoid the fileSwitch.boundary:Run configuration and only consider state machine adjustments after file-closing issues have been addressed.

I'm not sure that I fully understand the difference between pausing the system and shutting it down, in this context. Does "the system" mean "art"? My sense is that the "pausing" art is the right way to think about the artdaq "Stop" transition.

#7 Updated by Kyle Knoepfel about 3 years ago

Okay. I will forge ahead with the other output-closing-related issue and return the state machine as necessary.

#8 Updated by Kyle Knoepfel about 3 years ago

  • Status changed from Feedback to Resolved
  • % Done changed from 0 to 100

#9 Updated by Kyle Knoepfel about 3 years ago

The current head of the art develop branch has a fix to this problem. Kurt, can you rebuild your local copy of art and verify that the problem has been solved?

Implemented with art:b13c74c.

#10 Updated by Kurt Biery about 3 years ago

I ran some tests, and they also show that the modified code has fixed the problem.

Here are the output files from some of my tests, in reverse time order. Runs 808-810 all took place in a single invocation of artdaq/art. Run 811 happened after I restarted the artdaq/art system. The file pattern was artdaqdemo_r%06r_sr%02s.root. The file change-over was triggered by a max file size condition in runs 808-811 and a maximum number of events in run 811. The configuration for run 811 is shown at the bottom.

[biery@mu2edaq01 tmp]$ ls -altF | head -16
total 28414212
drwxr-xr-x  2 biery mu2e      28672 Aug  3 21:12 ./
-rw-r--r--  1 biery mu2e     258486 Aug  3 21:12 artdaqdemo_r000811_sr01_4.root
-rw-r--r--  1 biery mu2e    1743884 Aug  3 21:12 artdaqdemo_r000811_sr01_3.root
-rw-r--r--  1 biery mu2e    1743884 Aug  3 21:10 artdaqdemo_r000811_sr01_2.root
-rw-r--r--  1 biery mu2e    1743884 Aug  3 21:09 artdaqdemo_r000811_sr01.root
-rw-r--r--  1 biery mu2e     453001 Aug  3 21:05 artdaqdemo_r000810_sr01_9.root
-rw-r--r--  1 biery mu2e    1761033 Aug  3 21:05 artdaqdemo_r000810_sr01_8.root
-rw-r--r--  1 biery mu2e    1761033 Aug  3 21:03 artdaqdemo_r000810_sr01_7.root
-rw-r--r--  1 biery mu2e    1761033 Aug  3 21:01 artdaqdemo_r000810_sr01_6.root
-rw-r--r--  1 biery mu2e    1761033 Aug  3 21:00 artdaqdemo_r000810_sr01.root
-rw-r--r--  1 biery mu2e     850925 Aug  3 20:58 artdaqdemo_r000809_sr01_4.root
-rw-r--r--  1 biery mu2e    1761033 Aug  3 20:57 artdaqdemo_r000809_sr01_3.root
-rw-r--r--  1 biery mu2e    1761033 Aug  3 20:56 artdaqdemo_r000809_sr01.root
-rw-r--r--  1 biery mu2e     359036 Aug  3 20:54 artdaqdemo_r000808_sr01.root
outputs: {
  normalOutput: {
    module_type: RootOutput
    fileName: "/home/biery/tmp/artdaqdemo_r%06r_sr%02s.root" 
    #maxEventsPerFile : 200
    #maxSize : 500
    #fileSwitch : {
    #   boundary : SubRun
    #   force : true
    #}
    fileProperties : {
       granularity : Event
       maxSubRuns : 1
       maxSize : 1000
       maxEvents : 1000
    }
    compressionLevel: 0
    #tmpDir : "/home/biery" 
  }
} 

#11 Updated by Kyle Knoepfel about 3 years ago

Thank you, Kurt. Please let us know on what time scale you would like a new release that incorporates these changes. We somewhat expect that you would like these soon, so please don't hesitate to say so.

#12 Updated by Kyle Knoepfel about 3 years ago

  • Status changed from Resolved to Closed
  • Target version set to 2.02.02


Also available in: Atom PDF