Project

General

Profile

Feature #20007

Milestone #15372: art multi-threading phase 1

Add nthreads back-off based on grid-provided information

Added by Kyle Knoepfel about 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
Start date:
05/22/2018
Due date:
% Done:

100%

Estimated time:
2.00 h
Spent time:
Scope:
Internal
Experiment:
-
SSI Package:
art
Duration:

Description

When art is run on a grid site, the TBB initializer needs to use a maximum number of threads that does not exceed the allowed number as specified by the grid node. According to the grid folks:

HTCondor already sets an environment variable 'OMP_NUM_THREADS' to indicate the number of CPUs available in the slot. I think that should suffice for your use case? The environment variable should also be available on offsite pilots.

Can you submit a few test jobs (onsite and offsite) to see if the environment variable is sufficient?

If not, we can work on alternate ways of getting you the cpu information from the slot.

History

#1 Updated by Kyle Knoepfel about 1 year ago

  • Parent task set to #15372

This feature has been implemented, but I have left the status in feedback until we verify from the experiments that this environment variable meets the need.

#2 Updated by Kyle Knoepfel about 1 year ago

An email from Chris Green:

Assuming we are going to use (say) OMP_NUM_THREADS in art, then job scripts should:
HTCondor (as discussed):
:
Older Condor:
environment = "OMP_NUM_THREADS=$$(CPUS)"
SLURM:
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
MOAB/Torque:
export OMP_NUM_THREADS=$PBS_NUM_PPN
Note that we could do everything in art with the exception of older Condor installations.

Best,

Chris.

#3 Updated by Kyle Knoepfel about 1 year ago

  • Status changed from Feedback to Closed


Also available in: Atom PDF