Problem advertising requests affecting FE logs
Our osg-xsede.grid.iu.edu FE seems to work fine from the users point of view, but our FE graphs are looking weird, and I would like some thoughts on what could cause this. See the attached graph. The concern is the valleys of running jobs vs glideins. Full access to the graphs:
We now think that this problem has to do with the factory advertisements. For example, for one of our entries on the frontend:
[2013-08-08T23:04:35+00:00 21526] Idle (match eff old uniq ) Run ( here max ) | Total Idle Run | Idle MaxRun Down Factory
[2013-08-08T23:04:35+00:00 21526] 8784( 532k 8783 8784 0) 529k( 2376 610k) | 2379 1 2378 | 3232 12863 Up Sum of useful factories
[2013-08-08T23:04:35+00:00 21526] 0( 0 0 0 0) 0( 0 0) | 1333 6 1326 | 0 0 Down Sum of down factories
So the guess is that the jobs in the "Down" entries are not accounted for, and that is the valleys in the graphs.
Looking back in my email, I remembered that I had a similar discussion with the OSG factory admins:
"Jeff and I looked into these a little bit, and we think these two manifestations are probably unrelated to one another. The "down" entries we suspect come from instances where the factory collector randomly loses classads, and thus fails to communicate with a frontend for a few minutes. Checking our collector logs, we occasionally see things like:
05/30/13 07:37:47 DaemonCore: Can't receive command request from 126.96.36.199 (perhaps a timeout?)
That's your IP address, but actually we see this happening to all our frontends indiscriminately. It's probably nothing to worry about, unless it causes you some inconvenience."