Frontend in downtime affecting other frontends?
I think we just hit a new factory bug!
One of the FEs for this entry is requesting glideins, but getting none!
We found the culprit;
we had NEBio frontend in downtime, but for whatever reason, the factory stopped submitting for Engage as well!
We removed the downtime for NEBiogrid, and thing seem to work now (at least for Engage).
So, we have a workaround for now, but this needs to be fixed.
#1 Updated by Douglas Strain over 7 years ago
- Status changed from Assigned to Feedback
- Assignee changed from Douglas Strain to Parag Mhashilkar
Currently, the code sets the entry into downtime once it finds a security class / frontend into
downtime. Then, later frontends will not be able to send glideins (reqidle is set to zero somewhere)
This exactly matches Igor's description of the problem.
I have corrected this issue by not setting the entry into downtime and instead doing a "continue" as if it was a bad proxy. I think this is the proper solution, but maybe we should also hook this up with the new code to provide feedback to the frontend (add a new ticket for that? not sure if its been merged?).
I have sent to Parag to review this. Krista is also aware of the code as she re-wrote a bunch of broken things a while back.