art/Utilities/LinuxProcMgr treats non-fatal (transient) errors (e.g. EINTR) as fatal.
Under heavy load, an art job is likely to throw an exception of the form
%MSG-s ArtException: Raw2HDF5:raw2hdf5@EndJob 03-Feb-2018 07:34:44 CST ModuleEndJob cet::exception caught in art ---- OtherArt BEGIN ---- Configuration BEGIN Failed to open: cat /proc/21650/status ---- Configuration END ---- OtherArt BEGIN ---- Configuration BEGIN Failed to open: /proc/21650/stat for schedule: 0 ---- Configuration END ---- OtherArt END ---- OtherArt END %MSG
The code should check for transient
errno errors and loop as appropriate. See (e.g.) the
read(2) man page for details.
#2 Updated by Kyle Knoepfel almost 3 years ago
- Tracker changed from Bug to Support
- Category set to Infrastructure
- Status changed from Assigned to Resolved
- % Done changed from 0 to 100
- SSI Package art added
After discussion, it was deemed sufficient to report the value of
errno upon a file-open failure.
Implemented with commit art:c28eff0.