Project

General

Profile

How it should look

First -- get things setup:

novagpvm03$ typeset -f setup_nova
setup_nova ()
{
source /grid/fermiapp/nova/novaart/novasvn/srt/srt.sh;
export EXTERNALS=/nusoft/app/externals;
source $SRT_DIST/setup/setup_novasoft.sh "$@";
PRODUCTS=/grid/fermiapp/products/nova/db:$PRODUCTS;
setup ifdh_art v1_0_rc1 -q nu:e2:debug -k;
setup jobsub_tools
}
novagpvm03$ setup_nova

Then submit a job...

novagpvm03$ cat launch_1
jobsub -g \
-r S12-12-12 \
-N 3 \
--dataset_definition=all-mc-12-11-16 \
$IFDH_ART_DIR/bin/art_sam_wrap.sh \
-X nova \
--dest /nova/data/users/mengel \
--rename uniq \
--limit 1 \
-c /nova/app/users/anorman/NOVA-OFFLINE/cosmictrackjob.fcl
novagpvm03$ sh launch_borked
/nova/data/condor-tmp/mengel/art_sam_wrap.sh_20130117_112319_15623_1.dag
submitting....
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 7363441.
 
-----------------------------------------------------------------------
File for submitting this DAG to Condor : /nova/data/condor-tmp/mengel/art_sam_wrap.sh_20130117_112319_15623_1.dag.condor.sub
Log of DAGMan debugging messages : /nova/data/condor-tmp/mengel/art_sam_wrap.sh_20130117_112319_15623_1.dag.dagman.out
Log of Condor library output : /nova/data/condor-tmp/mengel/art_sam_wrap.sh_20130117_112319_15623_1.dag.lib.out
Log of Condor library error messages : /nova/data/condor-tmp/mengel/art_sam_wrap.sh_20130117_112319_15623_1.dag.lib.err
Log of the life of condor_dagman itself : /nova/data/condor-tmp/mengel/art_sam_wrap.sh_20130117_112319_15623_1.dag.dagman.log
-----------------------------------------------------------------------

We can now watch with condor_q.

First The condor_dagman job starts up...

novagpvm03$ condor_q mengel
-- Submitter: gpsn01.fnal.gov : <131.225.67.70:57013> : gpsn01.fnal.gov
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
7364645.0 mengel 1/17 11:43 0+00:00:24 R 0 7.3 condor_dagman
 
1 jobs; 0 idle, 1 running, 0 held

Then it stars the leader job

novagpvm03$ condor_q mengel
-- Submitter: gpsn01.fnal.gov : <131.225.67.70:57013> : gpsn01.fnal.gov
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
7364645.0 mengel 1/17 11:43 0+00:00:17 R 0 7.3 condor_dagman
7364646.0 mengel 1/17 11:43 0+00:00:00 I 0 0.0 art_sam_wrap.sh_20
 
2 jobs; 1 idle, 1 running, 0 held

After it finshes...

novagpvm03$ condor_q mengel
-- Submitter: gpsn01.fnal.gov : <131.225.67.70:57013> : gpsn01.fnal.gov
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
7364645.0 mengel 1/17 11:43 0+00:00:24 R 0 7.3 condor_dagman
 
1 jobs; 0 idle, 1 running, 0 held

Then your -N jobs (in our case 3) start up:

novagpvm03$ condor_q mengel
-- Submitter: gpsn01.fnal.gov : <131.225.67.70:57013> : gpsn01.fnal.gov
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
7364645.0 mengel 1/17 11:43 0+00:00:32 R 0 7.3 condor_dagman
7364647.0 mengel 1/17 11:43 0+00:00:00 I 0 0.0 art_sam_wrap.sh_20
7364648.0 mengel 1/17 11:43 0+00:00:00 I 0 0.0 art_sam_wrap.sh_20
7364649.0 mengel 1/17 11:43 0+00:00:00 I 0 0.0 art_sam_wrap.sh_20
 
4 jobs; 3 idle, 1 running, 0 held

Then if you go over to the station monitor
and pick your experiment, then pick your station, then pick your project; you should see a display like:

Project mengel-art_sam_wrap.sh_20130117_114322_16328

Generated at 2013-01-17 13:05:43

Project Id 12620
Status running
Owner mengel
Start time 2013-01-17 11:43:47
Dataset definition all-mc-12-11-16
Files in snapshot 15695
Files seen 3
Processes 3
Busy processes 0
Finished processes 3
Waiting processes 0
Error processes 0
Mean wait time (per file) 25s
Mean busy time (per file) 1min 7s
Last activity process ended at 2013-01-17 11:48:02

Processes

Process Id Node name Status Description Files seen Last change Waiting for Mean wait time (per file) Mean busy time (per file)
<a href="http://samweb.fnal.gov:8480/station_monitor/nova/stations/nova/projects/mengel-art_sam_wrap.sh_20130117_114322_16328/processes/60151">60151</a> fnpc5023.fnal.gov completed Consumer Process 1 2013-01-17 11:47:22 (process ended - completed) - 26s 53s
<a href="http://samweb.fnal.gov:8480/station_monitor/nova/stations/nova/projects/mengel-art_sam_wrap.sh_20130117_114322_16328/processes/60152">60152</a> fnpc4022.fnal.gov completed Consumer Process 1 2013-01-17 11:47:28 (process ended - completed) - 25s 58s
<a href="http://samweb.fnal.gov:8480/station_monitor/nova/stations/nova/projects/mengel-art_sam_wrap.sh_20130117_114322_16328/processes/60153">60153</a> fnpc4049.fnal.gov completed Consumer Process 1 2013-01-17 11:48:02 (process ended - completed) - 24s 1min 32s