Project

General

Profile

Bug #20320

Improve how subentries are picked for metasites

Added by Marco Mascheroni 12 months ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Category:
Factory
Target version:
Start date:
07/16/2018
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

Currently every time glideFactoryLib.submitGlideins is called we pick up one of the subentries and submit all the glideins to this subentry for this iteration (picking the subentry that has less submit+idle). This creates problems in case one of the subentries is broken and glideins goes immediately to held: every iteration we will pick up the same broken entry. I am thinking different ways of solving this:

1) Consider held jobs when picking the submit file. I can try this, but I suspect it won't help because held jobs may be removed.
2) Round robin: every iteration we pick up one different sub-entry and submit all the glideins there. We don't care about load anymore. The problem is that you may randomly overload one of the entries, especially because submitGlideins submit N glideins at each iteration.
2.a) Round Same idea, but at each iteration you submit to all entries instead of just one. There might possibly be some load issues because you increase the number of condor_submit, but they are gonna be done sequentially, so I don't think we should be worrying.
3) Assuming we and up doing multiple condor_submit per submitGlideins invocation, we can be smarter in selecting how many glideins to submit per each sub entry, for example assuming 4 subentries, 1 full, 1 empty and 2 half empty, and assuming you want to submit 4 glidens, you can submit 0 4 2 2 glidens (issue still being the empty one is probably the broken one...)

History

#1 Updated by Marco Mascheroni 11 months ago

  • Status changed from New to Feedback
  • Assignee changed from Marco Mascheroni to Marco Mambelli

#2 Updated by Marco Mambelli 11 months ago

  • Assignee changed from Marco Mambelli to Marco Mascheroni

#3 Updated by Marco Mascheroni 10 months ago

  • Assignee changed from Marco Mascheroni to Marco Mambelli

#4 Updated by Marco Mambelli 10 months ago

  • Status changed from Feedback to Resolved
  • Assignee changed from Marco Mambelli to Marco Mascheroni

#5 Updated by Marco Mambelli 8 months ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF