Project

General

Profile

Bug #17564

Problem with jobsub_q on new GPGrid

Added by Bruno Coimbra almost 2 years ago. Updated 6 months ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
-
Target version:
-
Start date:
08/22/2017
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

Testing jobsub on the new GPGrid I noticed jobsub_q is misbehaving.
The new cluster has two schedds but when you do a jobsub_q --jobsub-server=jobsub-dev.fnal.gov it only shows the one replying for the alias at the moment.

Joe talked to Tony and he mentioned the right way to get the proper output for jobsub_q would be with "condor_q -nobatch -all -global".

Please, see the example below:

-bash-4.1$ jobsub_q --group nova --user coimbra --jobsub-server=jobsub-dev.fnal.gov
JOBSUBJOBID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
1415.0@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.1@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.2@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.3@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.4@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.5@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.6@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.7@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.8@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.9@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.10@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.12@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.13@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.14@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
14 jobs; 0 completed, 0 removed, 0 idle, 0 running, 14 held, 0 suspended
-bash-4.1$ jobsub_q --group nova --user coimbra --jobsub-server=htcjsdev01.fnal.gov
JOBSUBJOBID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
1029.0@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
1029.1@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
1029.2@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
1029.3@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
1029.4@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
1029.5@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
1029.6@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
1029.7@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
1029.8@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
1029.9@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
1029.10@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
1029.11@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
1029.12@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
1029.13@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
1029.14@htcjsdev01.fnal.gov coimbra 08/18 10:42 0+00:00:00 H 0 0.0 probe_20170818_104243_1433902_0_1_wrap.sh
15 jobs; 0 completed, 0 removed, 0 idle, 0 running, 15 held, 0 suspended
-bash-4.1$ jobsub_q --group nova --user coimbra --jobsub-server=htcjsdev02.fnal.gov
JOBSUBJOBID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
1415.0@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.1@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.2@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.3@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.4@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.5@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.6@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.7@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.8@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.9@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.10@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.12@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.13@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh
1415.14@htcjsdev02.fnal.gov coimbra 08/18 10:37 0+00:00:00 H 0 0.0 probe_20170818_103718_1861623_0_1_wrap.sh

History

#1 Updated by Dennis Box over 1 year ago

  • Status changed from New to Resolved

The problem was the jobsub version was 1.2.4.rc4. The final release 1.2.4 has a new optional entry in jobsub.ini : 'condor_q_extra_flags'

If this option is present, the value is appended to the condor_q command. This option is not needed for condor versions earlier than 8.7. On jobsub-dev its values are:

condor_q_extra_flags = -allusers -nobatch

I had already tested this combination on jobsub-dev (new jobsub.ini entry, jobsub-1.2.4) and even had Nick update puppet as it kept blowing away the new entry. Puppet may be re-setting jobsub to rc4 as well, I will check.

#2 Updated by Dennis Box 6 months ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF