Glidein summary broken for multicore glideins
The summary info printed out into the stdout by the glidein are broken in case of multicore glideins.
The problem is due to the code expecting a single starterlog, which is not the case for multi-core glideins.
#1 Updated by Igor Sfiligoi over 5 years ago
- Status changed from New to Feedback
- Assignee changed from Igor Sfiligoi to Parag Mhashilkar
I have fixed the multi-core glidein parsing in a good enough way.
It works nicely for single-core jobs, but is still broken for multi-core jobs running in the glideins.
The main problem stems from the fact that the StarterLogs do not log the number of cores used;
we would need a radical change in how we do monitoring to properly fix that.
I think the fix is a major step forward compared to what we have now, thus a good enough short term fix.
(and good enough for current CMS AnaOps needs)
We should open a separate ticket to get a more complete solution in place.
The code is in v3/5654.
#3 Updated by Igor Sfiligoi over 5 years ago
- Status changed from Closed to Feedback
After some more experience, Jeff and I think the current solution is not fully satisfactory.
I.e. we do not scale down the number of jobs by the number of cores used.
(while we do it for walltime)
- Different attributes have different semantics
- It screws up the avg job duration calculation in the monitoring
I thus propose to scale down the number of jobs.
This will require rounding, since this count is supposed to be an integer.
I propose to round it up all the time (e.g. 1.25 -> 2.0)
If I don't hear otherwise, I am going to implement this ASAP.