Allow root to retry xroot file open
A few months ago, mu2e started regularly using xroot for reading
art input files. We find we get errors like:
---- FatalRootError BEGIN Fatal Root Error: @SUB=TUnixSystem::GetHostByName getaddrinfo failed for 'fndca1.fnal.gov': Temporary failure in name resolution ---- FatalRootError END
at the percent level. On a ticket, sysadmins and networking
experts can't find any real errors in DNS or networks,
so we seem to be stuck with this error for the forseeable future.
In investigating, we found that root can retry, Philippe Canal said:
The error is not supposed to be fatal. If the (ROOT) error handler
used in your executable is turning it into a fatal error then this
would prevent any retries ....
So it appears art is blocking root from retrying. This ticket
requests that art allow the retries. Ideally we would have some control
over the retry pattern since this problem may be completely fixed
with one retry, so we would want to enable one retry only. Other
more serious errors, that will fail all retries for hours would
just eat up grid time. Also ideally, there would be some distinction
between errors. So for example, "file not found" or "url sytntax error"
would not retry, but "DNS lookup" would retry for a few minutes,
and "xroot server not responding" would retry for a hour.
We are using art v2_10_04
#3 Updated by Kyle Knoepfel about 1 year ago
- Category set to Infrastructure
- Status changed from Assigned to Resolved
- % Done changed from 0 to 100
This bug has been fixed with commit art:9f251fa6. The commit also re-enables art's custom ROOT handler, which was accidentally disabled for the art 2.11 series.
It is quite difficult for us to anticipate the set of XRootD/file-handling errors that should induce a retry and those that should be fatal. For that reason, we will update the list of non-fatal errors as they are encountered. For now, the following errors are not fatal:
- Any error from
- Any error from
TNetXNGFile::Openthat is not marked as "FATAL" by XRootD