More flexible mechanisms for a job to request resources (cpus, memory, ...)
GlideinWMS uses condor.
Condor allows to specify the exact number of resources that you need:
1 cpu and 2GB, or 4 cpus and 5GB
It is possible also to ask for the whole node, whatever it is.
Would be nice to have the possibility to ask for a range:
between 3 and 7 cpus or
all the cpus you have left
This may require new mechanisms in condor.
#3 Updated by Parag Mhashilkar about 4 years ago
Email from Brian requesting the change
On Apr 19, 2016, at 10:42 AM, Brian Bockelman wrote: Hi Parag, Marco^2, (Also some HTCondor folks who might be amused by the setup) CMS is starting to commission “resizable jobs” - jobs whose RequestCpus attribute is an expression. The relevant portion of the ClassAd looks approximately looks like this: MinCores = 1 MaxCores = 4 OriginalCpus = 4 JOB_GLIDEIN_Cpus = "$$(Cpus:0)” RequestResizedCpus = (Cpus>MaxCores) ? MaxCores : ((Cpus < MinCores) ? MinCores : Cpus) JobCpus = ((JobStatus =!= 1) && (JobStatus =!= 5) && !isUndefined(MATCH_EXP_JOB_GLIDEIN_Cpus) && (int(MATCH_EXP_JOB_GLIDEIN_Cpus) isnt error)) ? int(MATCH_EXP_JOB_GLIDEIN_Cpus) : OriginalCpus RequestCpus = WMCore_ResizeJob ? (!isUndefined(Cpus) ? RequestResizedCpus : JobCpus) : OriginalCpus (with RequestMemory and MaxWallTimeMins adjusted accordingly). In English, this should read: If WMCore_ResizeJob is true, - When matchmaking, RequestCpus should evaluate to the number of cores in the candidate slot (adjusted to the MinCores / MaxCores range) - If the job is running, RequestCpus should be the number of cores in the matched slot - Otherwise, evaluate to the original CPU request. - “Original CPU” is the number of cores used to benchmark the workflow (useful point of reference for the memory and estimated wall time). Right now, it appears all works with the frontend because, when RequestCpus isn’t an integer, it’ll default to 1. The monitoring code appears to mostly be based on the slots, so it should also calculate running jobs correctly (very important for setting max jobs running). *However*, it’d be far more useful if glideinWMS could understand the setup natively: 0) Evaluate expressions in job ads. This implies a switch from using “condor_q -xml” to the python bindings, I think. 1) Understand MinCore