Project

General

Profile

Feature #16147

Enhance the automatic CPUs detection to compensate for PBS misconfiguration

Added by Marco Mambelli over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
High
Category:
-
Target version:
Start date:
04/10/2017
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

These 2 variables in PBS (specifically the first one on the node) should tell me the cores I can use:
  • PBS_NUM_PPN Number of procs per node allocated to the job
  • PBS_NP Number of execution slots (cores) for the job (on all available nodes)

Sometime there are inconsistencies.

On a OSG cluster with Moab, Hyak_CE, setting the ALLPROCS flag the following happens:
PBS_NP=12
PBS_NUM_NODES=1
PBS_NUM_PPN=1

PBS_NP is the number of cores across all nodes used by the job (# of processors) and PBS_NUM_PPN is the cores on this node (# of processors per node), so looking the second (PBS_NUM_PPN) should be more correct but here seems inconsistent: I can use 12 processors across 1 nodes and on this node I can use 1

The suggestion is to use the bigger number between:
PBS_NUM_PPN
the occurences of the host in PBS_NODEFILE
and PBS_NP if PBS_NUM_NODES=1

And flag a warning if these are different

This would compensate for misconfigurations and being optimistic (max cpus) worst case there is a slow down of the jobs in the node

History

#1 Updated by Marco Mambelli over 3 years ago

  • Status changed from New to Feedback
  • Assignee changed from Marco Mambelli to Dennis Box

changes are in v3/16147

#2 Updated by Dennis Box over 3 years ago

  • Assignee changed from Dennis Box to Marco Mambelli

#3 Updated by Marco Mambelli over 3 years ago

  • Status changed from Feedback to Resolved

#4 Updated by Parag Mhashilkar over 3 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF