Submitting jobs that go to GCloud¶
Execute the "kx509" command on your local machine and then scp the file scp /tmp/x509up_unnnn gerard1@cmssrv271:/tmp (where nnnn is your gid) ssh -l gerard1 cmssrv271.fnal.gov (note this will not work from off-site, you have to ssh in to some other Fermilab unix machine first that is accessible from off-site, fcluigpvm01.fnal.gov for instance). / There is a subdirectory HepCloud/fuess and HepCloud/amitoj already created there. There are two files that are important KISTI-jdlproto.jdl (this will submit one job) KISTI-jdlproto-100.jdl (this will submit 100 jobs) Both have been modified to assume your proxy is in /tmp/x509up_u1229 and /tmp/x509up_u10086 for fuess and amitoj respectively Once you have submitted you can do the following commands: [gerard1@cmssrv271 timm]$ condor_submit KISTI-jdlproto.jdl Submitting job(s). 1 job(s) submitted to cluster 282. [gerard1@cmssrv271 timm]$ condor_q -- Submitter: cmssrv271.fnal.gov : <188.8.131.52:9615?sock=28553_eeb9_3> : cmssrv271.fnal.gov ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 282.0 gerard1 11/12 09:11 0+00:00:00 I 0 0.0 submit.sh dmason_B 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended It shows one job submitted and idle. you can continue to do the command condor_status -pool cmssrv274.fnal.gov Eventually it will show slots coming back and available about 5 minutes later. [gerard1@cmssrv271 timm]$ condor_status -pool cmssrv274.fnal.gov Name OpSys Arch State Activity LoadAv Mem ActvtyTime slot1@glidein_2063 LINUX X86_64 Unclaimed Idle 0.430 58976 0+00:00:08 slot1_1@glidein_20 LINUX X86_64 Claimed Busy 0.000 1024 0+00:00:07 Machines Owner Claimed Unclaimed Matched Preempting X86_64/LINUX 2 0 1 1 0 0 Total 2 0 1 1 0 0 and the job will then show in status "R" for running [gerard1@cmssrv271 timm]$ condor_q -- Submitter: cmssrv271.fnal.gov : <184.108.40.206:9615?sock=28553_eeb9_3> : cmssrv271.fnal.gov ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 282.0 gerard1 11/12 09:11 0+00:03:58 R 0 0.0 submit.sh dmason_B 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended [gerard1@cmssrv271 timm]$ You can look at the following graphs in grafana https://fifemon.fnal.gov/hcf/dashboard/db/hep-cloud-slots (slots available to run hepcloud jobs from all directions) https://fifemon.fnal.gov/hcf/dashboard/db/gcloud-vm-status (actual vm's currently running in gcloud) and AWS VM's https://fifemon.fnal.gov/hcf/dashboard/db/aws-vm-status-by-account
Submitting jobs to go to AWS¶
Rather than making a special-case AWS thing we will simply run a NOvA workflow on AWS for demonstration purposes.
Amitoj and Stu have both been named honorary members of NOvA for purposes of the demo.
Log into novagpvm01.fnal.gov as yourself (works from off site) source /grid/fermiapp/products/common/etc/setups.sh export GROUP=nova setup jobsub_client jobsub_submit --group nova --resource-provides=usage_model=AWS_HEPCLOUD --memory=1000 --cpu=1 --disk=1000 file:///afs/fnal.gov/files/home/room1/timm/gridsleep.sh
This will submit one job and it will go to AWS.