Grunts and PBS

Grunts overview

The five grunts are each 4 x 8-core AMD 6128 processors, with 64GiB RAM and RAID I disk running Scientific Linux Fermi v5.n. They are connected to a switch via QDR Infiniband and are on a private 1Gb ethernet private network. They are running PBS.

We are now prepared to move towards using PBS as a resource scheduler for the grunts. What this will eventually mean is that you will be required to allocate a number of cores on the grunts for interactive or batch processing work, including development activities. As of now we do not need to use these PBS facilities in any strict way because we have only a few users that talk to each other regularly. This page contains information about the current PBS configuration and how it is being used, along with future plans for PBS use.

Current use

We now have two PBS queues: grunt and perf. The grunt queue is for anyone to use and is general purpose. The perf queue has limited access and has a higher priority than the grunt queue. This higher priority means that jobs in the perf queue will be launched before any jobs in the grunt queue. There is no preemption in the system, so a currently executing grunt job will be permitted to complete when a perf job is waiting to run.

The perf queue is used for the Geant4 performance runs. These jobs are run about once per month and take less than 15 hours. Soon Jun will be running these jobs and we've requested that he run them off-hours (after 5pm).

Configuration of PBS

The PBS configuration for the grunts currently indicates that there are 32 slots per node. This means you can allocate as little as one core on one node to do some work. The PBS allocation of one core is advisory only. This means that you can violate the one core constraint once your script has been launched, but you shouldn't. PBS tells your script how many cores/nodes have been allocated to the job in the file pointed to by $PBS_NODEFILE.

We have set the default number of cores to 32. This means that if you request one node, you will get all 32 cores to use if you do not specify a lower core count.

Example uses

Remember that -I means interactive. If more than one core or node is requested by a job, then an interactive shell prompted will be presented to you on the first of the nodes when the job starts.

Allocating an entire node:

qsub -l nodes=1:ppn=32 -q grunt -I

Allocating four cores from one node:

qsub -l nodes=1:ppn=4 -q grunt -I

Allocating four cores on two specific nodes (total of 8 cores):

qsub -l nodes=grunt2+grunt3:ppn=4 -q grunt -I

Allocating entire node (because of defaults):

qsub -l nodes=1 -q grunt -I