Project

General

Profile

Bug #21077

SharedMemoryFragmentManager issues with multiple readers

Added by Eric Flumerfelt about 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Category:
Needed Enhancements
Target version:
Start date:
10/09/2018
Due date:
% Done:

0%

Estimated time:
Experiment:
-
Co-Assignees:
Duration:

Description

When multiple processes are connected to the shared memory as readers, it is possible for ReadyForRead to return true, and then control would enter GetBufferForReading. GetBufferForReading, however, could loop infinitely if it cannot actually reserve a buffer because it was taken by another reader.

The fix on artdaq-core/feature/SharedMemoryReader_GetBufferForReading_LimitedRetries (should be SharedMemoryManager_GetBufferForReading_LimitedRetries) removes the infinite loop, replacing it with a more modest 5 retries. When the function fails, SharedMemoryFragmentManager will call ReadyForRead again.

Without this change, readers in transfer_driver shmem nonblocking mode tests can hang forever near the end of data.


Related issues

Related to artdaq - Bug #20976: multiple_art_processes_example brokenClosed09/28/2018

History

#1 Updated by Eric Flumerfelt about 1 year ago

  • Status changed from New to Resolved
[eflumerf@ironwork artdaq_core]$ git checkout feature/SharedMemoryReader_GetBufferForReading_LimitedRetries 
Switched to branch 'feature/SharedMemoryReader_GetBufferForReading_LimitedRetries'
[eflumerf@ironwork artdaq_core]$ git diff feature/Sep10_pdune_candidate 
diff --git a/artdaq-core/Core/SharedMemoryManager.cc b/artdaq-core/Core/SharedMemoryManager.cc
index d0fc529..00666f3 100644
--- a/artdaq-core/Core/SharedMemoryManager.cc
+++ b/artdaq-core/Core/SharedMemoryManager.cc
@@ -213,7 +213,7 @@ void artdaq::SharedMemoryManager::Attach()
        else
        {
                TLOG(TLVL_ERROR) << "Failed to connect to shared memory segment with key 0x" << std::hex << shm_key_
-                       << ", errno = " << strerror(errno) << ".  Please check " 
+                       << ", errno=" << errno << " (" << strerror(errno) << ")" << ".  Please check " 
                        << "if a stale shared memory segment needs to " 
                        << "be cleaned up. (ipcs, ipcrm -m <segId>)";
        }
@@ -229,9 +229,8 @@ int artdaq::SharedMemoryManager::GetBufferForReading()
        auto rp = shm_ptr_->reader_pos.load();

        TLOG(13) << "GetBufferForReading lock acquired, scanning " << shm_ptr_->buffer_count << " buffers";
-       bool retry = true;
        int buffer_num = -1;
-       while (retry)
+       for(int retry = 0; retry < 5; retry++)
        {
                ShmBuffer* buffer_ptr = nullptr;
                uint64_t seqID = -1;
@@ -291,7 +290,7 @@ int artdaq::SharedMemoryManager::GetBufferForReading()
                        TLOG(13) << "GetBufferForReading returning " << buffer_num;
                        return buffer_num;
                }
-               retry = false;
+               retry = 5;
        }

        if(buffer_num==-1) TLOG(13) << "GetBufferForReading returning -1 because no buffers are ready";

#2 Updated by Eric Flumerfelt about 1 year ago

  • Related to Bug #20976: multiple_art_processes_example broken added

#3 Updated by Eric Flumerfelt about 1 year ago

  • Target version set to artdaq_core v3_04_03

#4 Updated by Eric Flumerfelt about 1 year ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF