Project

General

Profile

HEPCloudKISTISubmit

Submitting jobs that go to GCloud

Execute the "kx509" command on your local machine and then scp the file 

scp /tmp/x509up_unnnn gerard1@cmssrv271:/tmp

(where nnnn is your gid) 

ssh -l gerard1 cmssrv271.fnal.gov

(note this will not work from off-site, you have to ssh in to some other 
Fermilab unix machine first that is accessible from off-site, fcluigpvm01.fnal.gov for instance).
/

There is a subdirectory HepCloud/fuess 
and HepCloud/amitoj already created there.

There are two files that are important

KISTI-jdlproto.jdl  
(this will submit one job)

KISTI-jdlproto-100.jdl
(this will submit 100 jobs)

Both have been modified to assume your proxy is in /tmp/x509up_u1229 and /tmp/x509up_u10086 for fuess and amitoj respectively

Once you have submitted you can do the following commands:

[gerard1@cmssrv271 timm]$ condor_submit KISTI-jdlproto.jdl
Submitting job(s).
1 job(s) submitted to cluster 282.
[gerard1@cmssrv271 timm]$ condor_q

-- Submitter: cmssrv271.fnal.gov : <131.225.207.60:9615?sock=28553_eeb9_3> : cmssrv271.fnal.gov
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
 282.0   gerard1        11/12 09:11   0+00:00:00 I  0   0.0  submit.sh dmason_B

1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

It shows one job submitted and idle.

you can continue to do the command

condor_status -pool cmssrv274.fnal.gov

Eventually it will show slots coming back and available about 5 minutes later.

[gerard1@cmssrv271 timm]$ condor_status -pool cmssrv274.fnal.gov
Name               OpSys      Arch   State     Activity LoadAv Mem    ActvtyTime

slot1@glidein_2063 LINUX      X86_64 Unclaimed Idle      0.430 58976  0+00:00:08
slot1_1@glidein_20 LINUX      X86_64 Claimed   Busy      0.000 1024  0+00:00:07
                     Machines Owner Claimed Unclaimed Matched Preempting

        X86_64/LINUX        2     0       1         1       0          0

               Total        2     0       1         1       0          0

and the job will then show in status "R" for running

[gerard1@cmssrv271 timm]$ condor_q

-- Submitter: cmssrv271.fnal.gov : <131.225.207.60:9615?sock=28553_eeb9_3> : cmssrv271.fnal.gov
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
 282.0   gerard1        11/12 09:11   0+00:03:58 R  0   0.0  submit.sh dmason_B

1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
[gerard1@cmssrv271 timm]$ 

You can look at the following graphs in grafana

https://fifemon.fnal.gov/hcf/dashboard/db/hep-cloud-slots

(slots available to run hepcloud jobs from all directions)

https://fifemon.fnal.gov/hcf/dashboard/db/gcloud-vm-status

(actual vm's currently running in gcloud)

and AWS VM's

https://fifemon.fnal.gov/hcf/dashboard/db/aws-vm-status-by-account

Submitting jobs to go to AWS

Rather than making a special-case AWS thing we will simply run a NOvA workflow on AWS for demonstration purposes.
Amitoj and Stu have both been named honorary members of NOvA for purposes of the demo.

Log into novagpvm01.fnal.gov as yourself (works from off site)

source /grid/fermiapp/products/common/etc/setups.sh
export GROUP=nova
setup jobsub_client
jobsub_submit --group nova --resource-provides=usage_model=AWS_HEPCLOUD --memory=1000 --cpu=1 --disk=1000 file:///afs/fnal.gov/files/home/room1/timm/gridsleep.sh

This will submit one job and it will go to AWS.