Project

General

Profile

General Physics Computing Facility (GPCF)

Overview of GPCF

The General Physics Computing Facility (GPCF) provides interactive and batch computing at Fermilab. The interactive login component comprises multiple virtual machines on each physical box and shared “scratch” disk. The batch resources are non-virtualized boxes running a Condor pool shared among all the experiments. These two parts of the system provide the development, debugging, and testing components of the system. The experiments will not include desktop clusters in their requirements since tasks historically carried out on desktops are being done on the interactive login machines. Since these machines are uniformly procured and maintained, as well as centrally located in CD computer rooms, their administrative load is more manageable. It is assumed that users will bring desktop and laptop hardware, but the CD responsibility is limited to consulting support in OS installation, Kerberos, and security features.

The local batch is provided by a Condor pool to which users in each experiment will submit jobs. To this end, each node in this pool will have accounts for the union of all users, and mount points for the union of all mounts needed by all experiments. Job submission will be via a general tool “jobsub”. The tool will pick up the experiment information based on the node from which it is submitted, and the job will set the group to that experiment when it starts on the general cluster. Ensuring that each group has both dedicated and shared resources on this Condor pool is managed through Condors Ranking provisions. User priorities within each group can be managed by setting priorities as determined by the experiment. Condor provides a tool called “condor_ssh_to_job” that provides interactive debugging ps, top, gdb, strace, lsof and even allows forwarding ports, using X, transferring files and other useful things.

Most of the processing will be performed on GRID resources. GRID resources include two important categories for Fermilab users: 1) the General Purpose GRID Cluster, and 2) the GRID at large. The GP Grid resources are characterized by their access to the centrally served file systems also available on the interactive and local batch systems. This makes configuring and running jobs on all three resources quite similar and straightforward for users. The second GRID category is potentially much larger and includes CDF, CMS and OSG computing sites beyond Fermilab. These nodes do not have the centrally served disk mounts and therefore have no immediate access to experiment software and data.