Feature #22403

Updated by Kyle Knoepfel over 1 year ago

The HPC resources used by Mu2e use whole node scheduling - that is we get the whole node for the requested time or until we are finished with it, which ever is earlier. On a typical KNL node we run something like 32 processes with 8 threads each or some other variation that adds up to 256 threads.

Our big CPU driver is for stage 1 MC jobs that use the EmptyEvent source. Currently we submit these jobs requesting a fixed number of events in each job. There is a big dispersion of execution times for jobs with a fixed number of events. Suppose we submit jobs that we expect will have a mean duration of 4 hours and a tail to 8 hours. On a typical node the first process might end after 3 hours and the last after 6 hours - so we have just wasted 1/64 of the allocation ( 1 of 32 processes for half of the overall time).

Over an ensemble of jobs, I bet this averages to about 25% of the total available cycles. Chris Jones has told me that CMS is getting pushback from the HPC centers about this.

For any job that uses EmptySource, we can choose a different strategy. We can tell the job to run for a fixed time. For example we might submit jobs with a time limit of 6 hours and tell art to stop processing events after 5:30 or 5:45. We are not charged for the time that is left over after the last process exits so it's not critical to do a detailed optimization of this backoff.

We request that art provide an option on EmptyEvent to tell the job to run until it has used a fixed amount of wall clock time. If it is too expensive to check the elapsed wall clock time after every event, then please provide an option to check the elapsed time every N events. We prefer that it be a configuration error to provide both a maximum wall clock time and a maximum number of events. For jobs that use EmptyEvent this would mean only a modest change in our workflow management and bookkeeping.

At this time we are not interested in this feature for RootInput since that would require a very intrusive change in our workflow management and we don't need that feature at this time. It's possible that we might request this feature in RootInput at a later date - but I hope not.