Project

General

Profile

Bug #5654

Glidein summary broken for multicore glideins

Added by Igor Sfiligoi over 5 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Parag Mhashilkar
Category:
Glidein
Target version:
Start date:
04/15/2014
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
First Occurred:
Occurs In:
Stakeholders:

CMS, OSG

Duration:

Description

The summary info printed out into the stdout by the glidein are broken in case of multicore glideins.

The problem is due to the code expecting a single starterlog, which is not the case for multi-core glideins.


Subtasks

Bug #5909: Scale down the monitoring info on number of jobs run by the glideinsClosedIgor Sfiligoi

History

#1 Updated by Igor Sfiligoi over 5 years ago

  • Status changed from New to Feedback
  • Assignee changed from Igor Sfiligoi to Parag Mhashilkar

I have fixed the multi-core glidein parsing in a good enough way.
It works nicely for single-core jobs, but is still broken for multi-core jobs running in the glideins.

The main problem stems from the fact that the StarterLogs do not log the number of cores used;
we would need a radical change in how we do monitoring to properly fix that.

I think the fix is a major step forward compared to what we have now, thus a good enough short term fix.
(and good enough for current CMS AnaOps needs)
We should open a separate ticket to get a more complete solution in place.

The code is in v3/5654.

Please review.

#2 Updated by Parag Mhashilkar over 5 years ago

  • Status changed from Feedback to Closed
  • Assignee changed from Parag Mhashilkar to Igor Sfiligoi

Looks ok. Merged it to branch_v3_2

#3 Updated by Igor Sfiligoi over 5 years ago

  • Status changed from Closed to Feedback

After some more experience, Jeff and I think the current solution is not fully satisfactory.

I.e. we do not scale down the number of jobs by the number of cores used.
(while we do it for walltime)

This is problematic for 2 reasons:
  • Different attributes have different semantics
  • It screws up the avg job duration calculation in the monitoring

I thus propose to scale down the number of jobs.
This will require rounding, since this count is supposed to be an integer.
I propose to round it up all the time (e.g. 1.25 -> 2.0)

If I don't hear otherwise, I am going to implement this ASAP.

#4 Updated by Igor Sfiligoi over 5 years ago

  • Assignee changed from Igor Sfiligoi to Parag Mhashilkar

I have made the changes, and comitted them to a new branch (out of latest branch_v3_2)
called v3/5654_p2

Please review.

#5 Updated by Parag Mhashilkar over 5 years ago

  • Status changed from Feedback to Closed

Created #5909 to take care of/track remaining tasks. Closing this one.



Also available in: Atom PDF