- Table of contents
- Wiki
Wiki¶
Frequently Asked Questions¶
Getting Started with the Jobsub Client¶
Project Development Team & Mailing Lists¶
- Parag Mhashilkar * Dennis Box * Shreyas Bhat
Listserv Mailing List | Description | Accessibility | Traffic |
---|---|---|---|
jobsub-support | Users Discussion mailing list | Open | ? |
jobsub | Internal team mailing list | By Invitation | Low |
jobsub-commit | Jobsub source code commits | Open, By invitation | High |
Statistics & Monitoring¶
Current JobSub Deployment
- JobSub Deployment Servers: https://cdcvs.fnal.gov/redmine/projects/grid_and_cloud_computing_operations/wiki/Fifebatch
Jobsub Experiment Transition & Usage Status¶
http://web1.fnal.gov/scoreboard/jobsub/jobsub-report-latest.html
Ganglia plots¶
http://ganglia.fnal.gov/?r=hour&cs=&ce=&m=&s=by+name&c=FIFE
Meetings¶
Remote Coordinates: Toll-Free Number: 866-740-1260 (U.S. & Canada). Access code: 4036693#
MEETING | FOCUS | WHEN | WHERE | MINUTES |
---|---|---|---|---|
Weekly Developers Meeting | Development & operations issues | Tuesday 10:00 am - 11:00 am | CS Board Room/WH5SW | Weekly Meeting Notes |
Users Meeting | User issues | On demand | TBD | User Meeting Notes |
JobSub Releases¶
Tasks Before Release¶
- Unit test new features & bug fixes
- Before a final release, one or more RC (release candidate) are released.
- RCs undergo integration testing on the development servers by the development team. ** documentation on current Integration_Test_Suite * After the development team is satisfied with their level of testing, GCSO (operations group) deploys the software on the pre-production machines to perform additional testing.
- After the release
- Send release announcement to <mailing list> * Open a ticket with GCSO to install new version on pre prod for wider testing
Release Schedule¶
JobSub releases are made once the features/bug fixes charted for a given release are tested. In addition to that there could be an emergency release made to address critical issues and/or bugs.
Visit the issue tracker https://cdcvs.fnal.gov/redmine/projects/jobsub/issues if you are interested in issues addressed in future releases. Click on the filters (Custom Queries) to the right of the page to look for features in a given release.
Release Notes¶
Release Notes for Jobsub client and server
Jobsub Tools Release Notes (Retired product folded into jobsub server)
Client User Guide¶
Jobsub services on fifebatch infrastructure¶
- How to access the Jobsub Client
- How to submit and manage your job Documentation on all the Jobsub Client Commands
- Quick How To Guide for Using the Jobsub Client Documentation on authentication and basic commands that should work for anyone with a Fermilab ID
Old Submission Model (jobsub_tools) v/s New Model (jobsub_client)¶
The previous jobsub_tools client took input arguments and generated a condor job description file (jdf) on to a shared disk between the client and server . It then ssh'ed to the server and executed a condor_submit. This architecture is beginning to reach the limits of its scalability.
The new client does not share disk mounts with its server. Given a couple of extra arguments as compared to the old client, it contacts the server through a REST API and passes the arguments through. The server then uses the 'old' jobsub to generate the condor jdf and perform a condor submit. Log, error, and stdout files are written to a local directory on the server machine, and can be retrieved using jobsub_fetchlog.py as documented below or via curl. Authentication to the new server is done vi kx509 certs using the 'getcert' utility which comes with all Fermi Scientific Linux installations.
- architecture design documents at https://cdcvs.fnal.gov/redmine/documents/672
- API specification at https://cdcvs.fnal.gov/redmine/documents/673
Job id Syntax¶
Longtime jobsub and/or condor users are familiar with the condor job id for thier jobs, which look like a floating point number i.e. my submitted job is number '19332.0' in the queue. The jobsub_client submits to a DNS alias which can have a number of physical jobsub servers behind it. To avoid confusion the returned job id will be of the form 'job_number@schedd_name.', i.e. '149.0@fifebatch2.fnal.gov'
Administrators Corner¶
JobSub Server¶
JobSub Client¶
Supported Platforms
Linux is the only officially supported platform. The client might still work on Mac OSX, may be missing some functionality and is for more adventurous users!
JobSub Client Dependencies¶
JobSub client depends on following to be installed/available in the machine
- *Submission Credentials
- A valid x509 proxy or cert/key pair that identifies with the VOMS server is required. You currently have two options for generating a valid proxy:
- Option 1: generate a proxy from a valid kerberos ticket. Use kinit and then kx509 to generate a proxy.
- Option 2: use cigetcert to generate a proxy using using your Fermilab Services Account name/password.
- Other institutions will be able to authenticate and submit jobs authenticating with cigetcert soon. Details will be linked to when available
- Software Dependency
- pycurl
- On Linux SL5: available in KITS, i.e. source /cvmfs/fermilab.opensciencegrid.org/products/common/etc/setups ; setup pycurl ** To install pycurl so it is automatically set up prior to jobsub_client install it as 'current' with one of the following incantations, depending on which OS/Pycurl version you are setting up for. ** Doing both incantations should not harm anything, 'setup jobsub_client' should pick the right one at set up time: ** upd install -G "-c" pycurl v7_15_5 -H Linux64bit+2.6-2.5 ** upd install -G "-c" pycurl v7_16_4 -H Linux64bit+2.6-2.12 * On Linux SL6: should be part of your python distribution. If not, yum install pycurl
- kx509: Software to generate x509 proxy from kerberos ticket.
- Fermilab is in the process of switching Certificate providers from Digicert Corporation to Cilogon Corporation. This process started about 6/1/16 and will be completed by 1/1/17, at which time Digicert proxies and the 'old' kx509 will not work.
- Kx509 (Digicert) is provided by the get-cert utility (http://computing.fnal.gov/authentication/kca/getcert-for-linux.html) .
- Kx509 (Cilogon) is available via fermilab-util_kx509.noarch rpm. Yum install -y fermilab-util_kx509
- CA Certificates: CA certificates are typically available in /etc/grid-security/certificates and can be installed from OSG (https://twiki.grid.iu.edu/bin/view/Documentation/Release3/InstallCertAuth). We need the Cilogon Basic CA (https://cilogon.org/cilogon-basic.pem), until 1/1/17: the KCA CA (https://fermi.servicenowservices.com/kb_view.do?sysparm_article=KB0010816) and DigiCert CA (https://www.digicert.com/digicert-root-certificates.htm) certificates installed in the system or set X509_CERT_DIR to point to the directory containing these certificates for non-standard installations. If you installed the OSG cacerts and updater scripts correctly this step should be taken care of automatically.
- Non-Expert User:
- Option 1: source /cvmfs/fermilab.opensciencegrid.org/products/common/etc/setups; setup jobsub_client
- Option 2: For FEF/GCO managed machines, open a SNOW ticket to get the above software installed.
Developers Corner¶
- New Developers Quick-Start
- Information for JobSub developers and contributors
- Integration Test Suite
- Making JobSub Releases
- Use Cases
Completed APIs¶
- Query list of accounting groups: /jobsub/acctgroups/
- Query list of jobs: /jobsub/acctgroups/<group_id>/jobs/
- Query a single job: /jobsub/acctgroups/<group_id>/jobs/<job_id>/
- Create/Submit a new job: /jobsub/acctgroups/<group_id>/jobs/<job_id>/
- Help for an accounting group: /jobsub/acctgroups/<group_id>/help/
- Download compressed output sandbox for a given job: /jobsub/acctgroups/<group_id>/jobs/<job_id>/sandbox/
- Upload files to Dropbox service: /jobsub/acctgroups/<group_id>/dropbox/
- Download files from Dropbox service: /jobsub/acctgroups/<group_id>/dropbox/<box_id>/<filename>/
Child pages of this page¶
- An example of setting PYTHONPATH
- Configure the Server
- Jobsubini
- Always run on grid
- Authentication methods
- Command path root
- Condor exec
- Condor installed in opt
- Condor setup cmd
- Default grid site
- Default voms role
- Desired os
- Dropbox path root
- GROUP
- Group superusers
- Has CVMFS
- Has usage model
- History db
- Ifdh base uri
- Job expected max lifetime default
- Job expected max lifetime long
- Job expected max lifetime medium
- Job expected max lifetime short
- Job lease duration
- Jobsub max cluster procs
- Jobsub max joblog size
- Motd file
- Myproxy server
- New jobsub ini vars by release
- Condor mail notify
- Condor q extra flags
- Default output host
- Downtime constraint
- Enable http cache
- Global superusers
- Hash nondefault proxy
- Http cache duration
- Krbrefresh query format
- Max jobsub log size
- Max logfile cache age
- Num transfer tries
- Requirements base
- Schedd load metric
- Set up ifdh
- Sleep random
- Sub group pattern
- Submit reject threshold
- Supported roles
- Vo constraint
- New jobsub ini vars index
- Output files web browsable allowed types
- Sandbox readable by group
- Sandbox timeout
- Site ignore list
- Supported groups
- Transfer krbcc to job
- Transfer wrapfile
- Voms
- Voms proxy lifetime
- Wn ifdh location
- X509 user proxy
- Jobsubini
- Frequently Asked Questions
- Information for JobSub developers and contributors
- Install the Server
- Integration Test Suite
- Jobsub Tools Release Notes
- Making JobSub Releases
- New Developers Quick-Start
- Obtaining the Client
- Old Configure the Server
- Old Install the Server
- Release Notes
- Use Cases
- User Meeting Notes
- Using the Client
- Weekly Meeting Notes