Project

General

Profile

Information about job submission to OSG sites » History » Version 128

Kenneth Herner, 03/05/2018 04:10 PM

1 1 Kenneth Herner
h1. Information about job submission to OSG sites
2 2 Kenneth Herner
3 2 Kenneth Herner
This page captures some of the known quirks about certain sites when submitting jobs there. 
4 2 Kenneth Herner
5 2 Kenneth Herner
h2. What this page is
6 2 Kenneth Herner
7 76 Kenneth Herner
Most OSG sites will work with with the jobsub default requests of 2000 MB of RAM, 35 GB of disk, and 8 hours run time, but at some sites there are some stricter limits. Additionally some sites only support certain experiments as opposed to the entire Fermilab VO. Here we list the OSG sites where users can submit jobs, along with all known cases where either the standard jobsub defaults may not work, or the site only supports certain experiments. Information on it is provided on a best-effort basis and is subject to change without notice.
8 2 Kenneth Herner
9 2 Kenneth Herner
h2. What this page is NOT
10 2 Kenneth Herner
11 69 Kevin Retzke
This page is *NOT* a status board or health monitor of the OSG sites. Just because your submission fits in with the guidelines here does not mean that your job will start quickly. Nor does it keep track of downtimes at the remote sites. Its sole purpose is to help you avoid submitting jobs with disk/memory/cpu/site combinations that will never work. Limited offsite monitoring is available from https://fifemon.fnal.gov/monitor/dashboard/db/offsite-monitoring
12 2 Kenneth Herner
13 2 Kenneth Herner
h2. Organization
14 2 Kenneth Herner
15 66 Kenneth Herner
The following table lists the available OSG sites, their Glidein_site name (what you should put in the --site option), what experiment(s) the site will support, and finally any known limitations on disk, memory, or CPU.
16 2 Kenneth Herner
17 107 Kenneth Herner
h2. Important notes and caveats: READ THEM ALL!
18 107 Kenneth Herner
19 85 Kenneth Herner
*NOTE 1:* In some cases you may be able to request more than the jobsub defaults and be fine. If you do try a site and put in requirements that exceed the jobsub defaults, sometimes a 
20 2 Kenneth Herner
21 38 Kenneth Herner
%{color:red}jobsub_q --better-analyze --jobid=<your job id>%
22 1 Kenneth Herner
23 21 Arthur Kreymer
will give you useful information about why a job doesn't start to run 
24 66 Kenneth Herner
(i.e. it may recommend lowering the disk or memory requirements to a certain value.) We provide information about the largest successful test we have had for memory if above 2000MB.
25 1 Kenneth Herner
26 15 Kenneth Herner
*NOTE 2:* Under supported experiments, "All" means all experiments except for CDF, D0, and LSST. It does include DES and DUNE.
27 16 Kenneth Herner
28 66 Kenneth Herner
*NOTE 3:* The estimated maximum lifetime is just an estimate based on a periodic sampling of glidein lifetimes. It may change from time to time and it does NOT take into account any walltime limitations of the local job queues at the site itself. *It also does not guarantee that there are resources available at any given moment to start a job with the longest possible lifetime.* You can modify your requested lifetime with the --expected-lifetime option.
29 1 Kenneth Herner
30 99 Kenneth Herner
*NOTE 4:* Take care to convert approproately when using the --memory switch with units in jobsub_submit. To stay consistent with HTCondor, *1GB = 1024MB in the --memory option*, not 1000 MB. So --memory=2GB is really --memory=2048MB, and so on and so forth. Thus if you are trying to structure your submission to fit within a certain constraint, and you are using GB as your units, remember to convert appropriately. All memory numbers on this page are in MB, the default HTCondor memory unit.
31 97 Kenneth Herner
32 106 Kenneth Herner
*NOTE 5:* Asking for exactly the estimated maximum time is not a good idea because you can't guarantee that your job will match exactly at the beginning of the glidein lifetime. If you need close to the max time, be sure to ask for slightly under it. Of course, don't ask for the full time if you don't really need it!
33 106 Kenneth Herner
34 108 Kenneth Herner
35 108 Kenneth Herner
36 108 Kenneth Herner
37 108 Kenneth Herner
38 108 Kenneth Herner
39 84 Kenneth Herner
|_. Site Name |_. --site option (sorted) |_. Supported Experiments |_. Known limitations  |_. Maximum memory |_. Estimated maximum job lifetime |
40 83 Kenneth Herner
| Brookhaven National Laboratory | BNL     | All | jobsub defaults are OK | 3000MB | 24 h |
41 111 Kenneth Herner
| Caltech T2                     | Caltech | All | jobsub defaults are OK | 4000MB | 25 h |
42 123 Kenneth Herner
| CERN Tier 0                    | CERN    | DUNE only | dunepro jobs only | 2500MB | 24 h |
43 83 Kenneth Herner
| Clemson                        | Clemson | All | jobsub defaults are OK | 14000MB | 24 h |
44 128 Kenneth Herner
| Colorado | Colorado | All | jobsub defaults are OK | 16000MB | 48 h |
45 83 Kenneth Herner
| Cornell                        | Cornell | All | jobsub defaults are OK | 2500MB | unknown |
46 120 Kenneth Herner
| GPGrid        | FermiGrid | All+CDF+LSST+D0 | jobsub defaults are OK | 16000MB | 95 h |
47 86 Kenneth Herner
| FNAL CMS Tier 1  | FNAL  | All | jobsub defaults are OK | 16000MB | 24 h |
48 116 Kenneth Herner
| "Czech Academy of Sciences":http://monitor.farm.particle.cz/total_overview.php     | FZU        | DUNE and NOvA only | request --disk=20000MB or less | 2500MB | unknown |
49 83 Kenneth Herner
| University of Washington       | Hyak_CE    | All | available resources vary widely | Tested to 7000MB | 3.5 h |
50 125 Kenneth Herner
| JINR HTcondor CE               | JINR_CLOUD | NOvA only, mu2e soon | no multicore jobs | 2500MB | 46 h |
51 1 Kenneth Herner
| JINR Tier 2| JINR_Tier2 | NOvA only | no multicore jobs | 2500MB | 46 h |
52 115 Kenneth Herner
| Imperial College London        | London     | DUNE only | no multicore jobs | 2048 MB | 24-48 h |
53 114 Kenneth Herner
| University of Manchester       | Manchester | uboone only, DUNE support coming | disk default is OK
54 83 Kenneth Herner
Ask for one CPU for every 2 GB of memory requested (e.g. 2 CPUs for memory request between 2 and 4 GB.) | Tested to 8000MB | between 12-24 h |
55 83 Kenneth Herner
| ATLAS Great Lakes Tier 2 (AGLT2) | Michigan | All | jobsub defaults are OK | 2500MB | approx. 8 h|
56 83 Kenneth Herner
| MIT:http://www.cmsaf.mit.edu/condor4web/ | MIT        | All + CDF | jobsub defaults are OK | 2500MB | unknown |
57 66 Kenneth Herner
| Midwest Tier2                  | MWT2       | All | jobsub defaults are OK
58 83 Kenneth Herner
single core jobs will take a very long time to run if requesting more than 1920 MB of memory.
59 113 Kenneth Herner
Running custom 3.x linux kernels, SL6 software may need to set the UPS_OVERRIDE environment variable to setup UPS products. | Tested to 7680MB | 5 h |
60 83 Kenneth Herner
| Red/Sandhills                           | Nebraska   | All |jobsub defaults are OK;
61 83 Kenneth Herner
Some slots are SL6 Docker containers on SL7 hosts | Tested to 8000MB | 48 h |
62 121 Kevin Retzke
| Notre Dame                     | NotreDame  | All | aim for short jobs due to preemption. | 32000MB / 8 cpu| 24 h |
63 1 Kenneth Herner
| Tusker/Crane                   | Omaha      | All | jobsub defaults are OK | 3000MB | 24 h |
64 83 Kenneth Herner
| Ohio Supercomputing Center     | OSC        | NOvA only | jobsub defaults are OK | estimated 2500MB | 46 h |
65 126 Kenneth Herner
| University of Sheffield        | Sheffield  | DUNE only | jobsub defaults are OK | 16000MB | 48 h |
66 114 Kenneth Herner
| Southern Methodist University  | SMU_HPC    | NOvA only | jobsub defaults are OK | 2500MB | 24 h |
67 89 Kenneth Herner
| Stampede (TACC) | Stampede | MINOS only | unknown maximum disk | 32000MB | unknown |
68 95 Kenneth Herner
| Stanford Proclus | HOSTED_BOSCO_CE | All | varies but jobsub defaults should be OK |  --memory=16384MB | estimated 12h |
69 87 Kenneth Herner
| Syracuse                       | SU-OG      | All | request --disk=9000MB
70 103 Kenneth Herner
%{color:red} 2015/10/15 libXpm.so not installed (may be required by ROOT) on all nodes% | 2500MB | 46 h |
71 49 Kenneth Herner
| %{color:red} Texas Tech%       | TTU        | All but mu2epro and seaquest | jobsub defaults are OK
72 83 Kenneth Herner
%{color:red} 2015/11/20 down since OSG software upgrade% | unknown | unknown |
73 83 Kenneth Herner
| University of Chicago          | UChicago   | All | linked with MWT2; recommend --memory=1920MB or less per core
74 112 Kenneth Herner
Many nodes have 3.x kernels on them, so be sure to set the UPS_OVERRIDE environment variable appropriately. | Tested to 7680 MB with 4 CPUs | 5 h |
75 83 Kenneth Herner
| University of California, San Diego | UCSD  | All |jobsub defaults are OK | 4000MB | 13 h |
76 124 Kenneth Herner
| University of Bern             | UNIBE-LHEP | uboone only | Requires some special option with the --lines jobsub option:  --lines='+count=N' --lines='+memory=2000' --lines='+runtimeenvironment = "APPS/HEP/UBOONE-MULTICORE-1.0"' Be sure to request a CPU for every 2000MB of memory requested | 4000MB | 48 h |
77 83 Kenneth Herner
| Grid Lab of Wisconsin (GLOW)   | Wisconsin  | All | jobsub defaults are OK | 8000MB | 24 h |
78 83 Kenneth Herner
| Western Tier2 (SLAC)           | WT2        | uboone only | jobsub defaults are OK | 2500MB | 10 days |