Project

General

Profile

Information about job submission to OSG sites » History » Version 142

« Previous - Version 142/145 (diff) - Next » - Current version
Kenneth Herner, 03/13/2019 12:36 PM


Information about job submission to OSG sites

This page captures some of the known quirks about certain sites when submitting jobs there.

What this page is

Most OSG sites will work with with the jobsub default requests of 2000 MB of RAM, 35 GB of disk, and 8 hours run time, but at some sites there are some stricter limits. Additionally some sites only support certain experiments as opposed to the entire Fermilab VO. Here we list the OSG sites where users can submit jobs, along with all known cases where either the standard jobsub defaults may not work, or the site only supports certain experiments. Information on it is provided on a best-effort basis and is subject to change without notice.

What this page is NOT

This page is NOT a status board or health monitor of the OSG sites. Just because your submission fits in with the guidelines here does not mean that your job will start quickly. Nor does it keep track of downtimes at the remote sites. Its sole purpose is to help you avoid submitting jobs with disk/memory/cpu/site combinations that will never work. Limited offsite monitoring is available from https://fifemon.fnal.gov/monitor/dashboard/db/offsite-monitoring

Organization

The following table lists the available OSG sites, their Glidein_site name (what you should put in the --site option), what experiment(s) the site will support, and finally any known limitations on disk, memory, or CPU.

Important notes and caveats: READ THEM ALL!

NOTE 1: In some cases you may be able to request more than the jobsub defaults and be fine. If you do try a site and put in requirements that exceed the jobsub defaults, sometimes a

jobsub_q --better-analyze --jobid=<your job id>

will give you useful information about why a job doesn't start to run
(i.e. it may recommend lowering the disk or memory requirements to a certain value.) We provide information about the largest successful test we have had for memory if above 2000MB.

NOTE 2: Under supported experiments, "All" means all experiments except for CDF, D0, and LSST. It does include DES and DUNE.

NOTE 3: The estimated maximum lifetime is just an estimate based on a periodic sampling of glidein lifetimes. It may change from time to time and it does NOT take into account any walltime limitations of the local job queues at the site itself. It also does not guarantee that there are resources available at any given moment to start a job with the longest possible lifetime. You can modify your requested lifetime with the --expected-lifetime option.

NOTE 4: Take care to convert approproately when using the --memory switch with units in jobsub_submit. To stay consistent with HTCondor, 1GB = 1024MB in the --memory option, not 1000 MB. So --memory=2GB is really --memory=2048MB, and so on and so forth. Thus if you are trying to structure your submission to fit within a certain constraint, and you are using GB as your units, remember to convert appropriately. All memory numbers on this page are in MB, the default HTCondor memory unit.

NOTE 5: Asking for exactly the estimated maximum time is not a good idea because you can't guarantee that your job will match exactly at the beginning of the glidein lifetime. If you need close to the max time, be sure to ask for slightly under it. Of course, don't ask for the full time if you don't really need it!

Site Name --site option (sorted) Supported Experiments Known limitations Maximum memory Estimated maximum job lifetime
Brookhaven National Laboratory BNL All jobsub defaults are OK 3000MB 24 h
Boston University ATLAS T2 BU All jobsub defaults are OK 2500MB 8-12 h
Caltech T2 Caltech All jobsub defaults are OK 4000MB 25 h
IN2P3 Computing Center Lyon CCIN2P3 DUNE only jobsub defaults are OK, 4 cpus max 10000 MB 30 h
CERN Tier 0 CERN DUNE only dunepro jobs only 2500MB 24 h
CIEMAT CIEMAT DUNE only defaults OK 2500MB 24 h
Clemson Clemson All jobsub defaults are OK 14000MB 24 h
Colorado Colorado All jobsub defaults are OK 16000MB 48 h
Cornell Cornell All jobsub defaults are OK 2500MB unknown
GPGrid FermiGrid All+CDF+LSST+D0 jobsub defaults are OK 16000MB 95 h
University of Florida HPC Florida DUNE only jobsub defaults are OK 4000MB 24 h
FNAL CMS Tier 1 FNAL All jobsub defaults are OK 16000MB 24 h
Czech Academy of Sciences FZU DUNE and NOvA only request --disk=20000MB or less 2500MB unknown
University of Washington Hyak_CE All available resources vary widely Tested to 7000MB 3.5 h
JINR HTcondor CE JINR_CLOUD NOvA only, mu2e soon no multicore jobs 2500MB 46 h
JINR Tier 2 JINR_Tier2 NOvA only no multicore jobs 2500MB 46 h
Imperial College London London DUNE only multicore allowed 2048 MB/core 24-48 h
University of Liverpool Liverpool DUNE only jobsub defaults OK, no multicore yet 2500 MB 24 h
University of Manchester Manchester uboone, DUNE currently SL7 only
Ask for one CPU for every 2 GB of memory requested (e.g. 2 CPUs for memory request between 2 and 4 GB.)
Tested to 8000MB between 12-24 h
ATLAS Great Lakes Tier 2 (AGLT2) Michigan All currently SL7 only 2500MB approx. 8 h
MIT:http://www.cmsaf.mit.edu/condor4web/ MIT All + CDF jobsub defaults are OK blocked due to very high eviction rate 2500MB unknown
Midwest Tier2 MWT2 All jobsub defaults are OK
single core jobs will take a very long time to run if requesting more than 1920 MB of memory.
Running custom 3.x linux kernels, SL6 software may need to set the UPS_OVERRIDE environment variable to setup UPS products.
Tested to 7680MB 5 h
Red/Sandhills Nebraska All jobsub defaults are OK;
Some slots are SL6 Docker containers on SL7 hosts
Tested to 8000MB 48 h
NIKHEF NIKHEF DUNE only still commissioning 2994MB 30 h
Notre Dame NotreDame All aim for short jobs due to preemption. 32000MB / 8 cpu 24 h
Tusker/Crane Omaha All jobsub defaults are OK 3000MB 24 h
Ohio Supercomputing Center OSC NOvA only jobsub defaults are OK estimated 2500MB 46 h
PIC PIC DUNE only defaults OK 2500MB 24 h
INFN Pisa Pisa g-2 only single-core only, multi-core coming 2500MB 24 h
Rutherford Appleton Laboratory T1 RAL DUNE only jobsub defaults are OK, 2 cpus max 5120 MB 59 h
Rutherford Appleton Laboratory T2 SGrid DUNE only jobsub defaults are OK, 2 cpus max 5120 MB 59 h
University of Edinburgh SGridECDF DUNE only jobsub defaults are OK 4000MB 48 h
University of Oxford SGridOxford DUNE only jobsub defaults are OK 16000MB 48 h
University of Sheffield Sheffield DUNE only jobsub defaults are OK 16000MB 48 h
Southern Methodist University SMU_HPC NOvA only jobsub defaults are OK 2500MB 24 h
Stampede (TACC) Stampede MINOS only unknown maximum disk 32000MB unknown
Stanford Proclus HOSTED_STANFORD All varies but jobsub defaults should be OK --memory=16384MB estimated 12h
Syracuse SU-ITS All request --disk=9000MB
2015/10/15 libXpm.so not installed (may be required by ROOT) on all nodes
2500MB 46 h
Texas Tech TTU All but mu2epro and seaquest jobsub defaults are OK
2015/11/20 down since OSG software upgrade
unknown unknown
University of Chicago UChicago All linked with MWT2; recommend --memory=1920MB or less per core
Many nodes have 3.x kernels on them, so be sure to set the UPS_OVERRIDE environment variable appropriately.
Tested to 7680 MB with 4 CPUs 5 h
University of California, San Diego UCSD All jobsub defaults are OK 4000MB 13 h
University of Bern UNIBE-LHEP uboone only Requires some special option with the --lines jobsub option: --lines='+count=N' --lines='+memory=2000' --lines='+runtimeenvironment = "APPS/HEP/UBOONE-MULTICORE-1.0"' Be sure to request a CPU for every 2000MB of memory requested 4000MB 48 h
Grid Lab of Wisconsin (GLOW) Wisconsin All jobsub defaults are OK 8000MB 24 h
Western Tier2 (SLAC) WT2 uboone only jobsub defaults are OK 2500MB 10 days