Project

General

Profile

Idea #22372

daqinterface - can it check, before start, for stale sharedmemory segments for the appropriate partition?

Added by Ron Rechenmacher 6 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
04/12/2019
Due date:
% Done:

100%

Estimated time:
Experiment:
-
Duration:

Description

Wondering if daqInterface can, before start when all processes are known to be down (i.e. valid shutdown), check for
stale shared memory segments?

History

#1 Updated by John Freeman 6 months ago

That would definitely be useful. The key there - no pun intended - is that DAQInterface would need to know what shared memory segments artdaq was planning to create given the set of processes about to be launched and the partition in question and then checking to see if those segments were already there. Not sure how straightforward that would be, but we can talk about it at the meeting on Monday.

#2 Updated by Eric Flumerfelt 6 months ago

I can certainly modify SharedMemoryEventManager so that the shared memory key at least reflects the partition number...probably 0xBE##PPPP and 0xCE##PPPP where ## represents the partition number and PPPP is the PID.

#3 Updated by Eric Flumerfelt 4 months ago

I have implemented this change to SMEM on artdaq:feature/22372_SMEM_PartitionNumberInKey. The keys will be 0xEE##PPPP for events and 0xBB##PPPP for broadcasts, where ## indicates the partition number and PPPP indicates the PID.

#4 Updated by John Freeman about 2 months ago

  • % Done changed from 0 to 80
  • Assignee set to John Freeman
  • Status changed from New to Assigned

I've created branch feature/issue22372_handle_orphaned_shmem, current head 85f214e137da06797dc23a0c5952a64b015c73d2. Right now it contains a script called "mopup_shmem.sh" which takes as an argument the partition number whose artdaq processes' shared memory blocks you want to clean up. It largely does what you'd expect it to (e.g., it refuses to run ipcrm if it sees that for the partition in question there's a live instance of DAQInterface which hasn't been confirmed to be in the "stopped" state).

My next step's a simple one: I'll have DAQInterface call this script before launching artdaq processes on a given node.

#5 Updated by John Freeman about 2 months ago

  • % Done changed from 80 to 100
  • Status changed from Assigned to Resolved
With commit e2bc8279da6025165cfd93c61680a20b19f277e5 at the head of DAQInterface's feature/issue22372_handle_orphaned_shmem:
  • The DAQInterface side of this issue is now resolved
  • As per an agreement via email with Eric, the process of resolving this issue has involved using artdaq's feature/22372_SMEM_PartitionNumberInKey branch, so that artdaq feature branch can be considered reviewed, with the caveat that the partition number appears as one value higher in the keys of the shared memory blocks created by artdaq (e.g., partition 1 results in keys like "0x02003005" and "0xbb02b1be")
Please note:
  • On the boot transition DAQInterface will check for shared memory blocks associated with its partition and clean them up before launching the processes by calling a new script called "mopup_shmem.sh" on every node on which artdaq processes will run
  • To see the output of mopup_shmem.sh, set the "debug level" in the boot file to 4. You can also run "ipcs" before and after the boot transition.
  • When reviewing this, a good technique to create orphaned shared memory blocks is to kill -9 DAQInterface after it's in the ready state. Then you can proceed to see if this feature does what it's supposed to by relaunching DAQInterface and putting it through the boot transition.


Also available in: Atom PDF