art "readFile" errors affecting MicroBooNE production jobs
Hello art experts,
Recently (for the last week or two) we (from the Fermilab MicroBooNE production team) have been seeing a large number of LArSoft jobs fail with the following art error:
IFCatalogInterface destructor: %MSG-s ArtException: PostEndJob 05-Sep-2020 03:28:09 UTC ModuleEndJob cet::exception caught in art ---- OtherArt BEGIN ---- LogicError BEGIN Source readFile() did not return a valid FileBlock: FileBlock should be valid or readFile() should throw. ---- LogicError END ---- OtherArt END %MSG Art has completed and will exit with status 1.
This always seems to affect the first 10 - 30% of submitted jobs, but these jobs often run without issue when re-submitted. This has been impacting a large number of different workflows using a variety of different samples of artroot input files (which have all been used in the past without issue), so I don't think the problem is specific to any particular input artroot file or LArSoft workflow.
We are using art version: v3_01_02
and ifdh_art version: v2_07_03 or v2_07_07 (I have seen jobs using both of these versions fail with this readFile error)
Any insights you could provide would be much appreciated! I have attached a full log file for one of these failed jobs in case that would be helpful.