Flaky Networking resulting in FTS backlog
According to Robert I: The FTS version that is running will wait forever for HTTP operations to complete. So if something causes them to hang without responding they can block anything from happening.
He can upgrade to the latest version; this has a timeout to prevent this from permanently hanging the system (short of a restart).
It seems like the system needs a restart. Further investigations in the morning.
#2 Updated by Michael Baird about 7 years ago
Most likely related to this is the apparent extreme slowness of seeing files on /bluearc/nova/data from novadaq-ctrl-farm-16.fnal.gov. A simple "ls" or "find" command seems to take a VERY long time (~minutes) to return results. All of the nearline data is kept in /bluearc/nova/data and this machine (novadaq-ctrl-farm-16) is the machine that makes all of the nearline plots that are updated to the web, so not being able to access these directories has caused all nearline monitoring to stop.