Project

General

Profile

Bug #7169

jobsub_history returning different outputs in 2 successive calls

Added by Gerard Bernabeu Altayo about 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Parag Mhashilkar
Category:
-
Target version:
Start date:
10/16/2014
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

Hi,

during testing of a GWMS factory reinstallation I've seen the following behaviour of jobsub_history (note that the commands are the same but the output is very different):

<minosgpvm01.fnal.gov> jobsub_history --version
1.0
<minosgpvm01.fnal.gov> jobsub_submit --resource-provides=usage_model=OPPORTUNISTIC --group=fermilab --jobsub-server https://fifebatch-preprod.fnal.gov:8443 file:///bin/hostname
Server response code: 200
Response OUTPUT:
/scratch/uploads/fermilab/gerard1/2014-10-16_091925.190258_4780

/scratch/uploads/fermilab/gerard1/2014-10-16_091925.190258_4780/hostname_20141016_091925_6149_0_1.cmd

Report any problems to the service desk

submitting....

Submitting job(s).

1 job(s) submitted to cluster 424.

JobsubJobId of first job: 424.0@fermicloud391.fnal.gov

Use job id 424.0@fermicloud391.fnal.gov to retrieve output

Remote Submission Processing Time: 0.815547943115 sec
<minosgpvm01.fnal.gov> jobsub_q --jobsub-server https://fifebatch-preprod.fnal.gov:8443 --group=fermilab
Server response code: 200
Response OUTPUT:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
424.0@fermicloud391.fnal.gov gerard1 10/16 09:19 0+00:00:00 I 0 0.0 hostname_20141016_091925_6149_0_1_wrap.sh
456.0@fermicloud383.fnal.gov gerard1 10/16 09:19 0+00:00:00 I 0 0.0 hostname_20141016_091923_27996_0_1_wrap.sh
Remote Listing Processing Time: 0.130522012711 sec
<minosgpvm01.fnal.gov> jobsub_q --jobsub-server https://fifebatch-preprod.fnal.gov:8443 --group=fermilab
Server response code: 200
Response OUTPUT:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
Remote Listing Processing Time: 0.136599063873 sec
<minosgpvm01.fnal.gov> jobsub_history --jobsub-server https://fifebatch-preprod.fnal.gov:8443 --group=fermilab
Server response code: 200
Response OUTPUT:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
290.0@fermicloud393.fnal.gov gerard1 10/16 08:41 0+00:00:00 C 0 0.0 hostname_20141016_084116_25477_0_1_wrap.sh
178.0@fermicloud393.fnal.gov gerard1 10/08 06:02 0+00:00:00 X 0 0.0 hostname_20141008_060241_16931_0_1_wrap.sh
167.0@fermicloud393.fnal.gov gerard1 10/07 11:10 0+00:00:00 X 0 0.0 id_20141007_111036_23839_0_1_wrap.sh
166.0@fermicloud393.fnal.gov gerard1 10/07 10:51 0+00:00:00 X 0 0.0 id_20141007_105148_19881_0_1_wrap.sh
Remote Listing Processing Time: 1.61852884293 sec
<minosgpvm01.fnal.gov> jobsub_history --jobsub-server https://fifebatch-preprod.fnal.gov:8443 --group=fermilab
Server response code: 200
Response OUTPUT:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
456.0@fermicloud383.fnal.gov gerard1 10/16 09:19 0+00:00:00 C 0 0.0 hostname_20141016_091923_27996_0_1_wrap.sh
gerard1 10/08 06:03 0+00:00:00 C 0 0.0 hostname_20141008_060325_13462_0_1_wrap.sh
gerard1 10/07 11:14 0+00:00:00 X 0 0.0 hostname_20141007_111459_23985_0_1_wrap.sh
neha 09/24 10:27 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102739_27530_0_1_wrap.sh
neha 09/24 10:27 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102738_27391_0_1_wrap.sh
neha 09/24 10:27 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102737_27246_0_1_wrap.sh
neha 09/24 10:26 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102643_26945_0_1_wrap.sh
neha 09/24 10:26 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102642_26802_0_1_wrap.sh
neha 09/24 10:26 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102613_26589_0_1_wrap.sh
neha 09/24 10:26 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102605_26422_0_1_wrap.sh
neha 09/24 10:21 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102120_24902_0_1_wrap.sh
neha 08/20 22:48 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140820_224823_5582_0_1_wrap.sh
Remote Listing Processing Time: 1.26173210144 sec
<minosgpvm01.fnal.gov>

I fear this may have to do with something not working ok with the HA system...

I guess incomplete lists are better than nothing, but user should be notified that there was an error somewhere and output is not complete.

History

#1 Updated by Gerard Bernabeu Altayo about 6 years ago

It's very easy to reproduce, just make calls... It looks very much like an HA issue (only getting local history per jobsub server?), in preprod we have 3 servers and:

<minosgpvm01.fnal.gov> jobsub_history --jobsub-server https://fifebatch-preprod.fnal.gov:8443 --group=fermilab
Server response code: 200
Response OUTPUT:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
424.0@fermicloud391.fnal.gov gerard1 10/16 09:19 0+00:00:00 C 0 0.0 hostname_20141016_091925_6149_0_1_wrap.sh
301.0@fermicloud391.fnal.gov gerard1 10/07 11:29 0+00:00:00 X 0 0.0 hostname_20141007_112909_7002_0_1_wrap.sh
Remote Listing Processing Time: 7.11388492584 sec
<minosgpvm01.fnal.gov> jobsub_history --jobsub-server https://fifebatch-preprod.fnal.gov:8443 --group=fermilab
Server response code: 200
Response OUTPUT:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
290.0@fermicloud393.fnal.gov gerard1 10/16 08:41 0+00:00:00 C 0 0.0 hostname_20141016_084116_25477_0_1_wrap.sh
178.0@fermicloud393.fnal.gov gerard1 10/08 06:02 0+00:00:00 X 0 0.0 hostname_20141008_060241_16931_0_1_wrap.sh
167.0@fermicloud393.fnal.gov gerard1 10/07 11:10 0+00:00:00 X 0 0.0 id_20141007_111036_23839_0_1_wrap.sh
166.0@fermicloud393.fnal.gov gerard1 10/07 10:51 0+00:00:00 X 0 0.0 id_20141007_105148_19881_0_1_wrap.sh
Remote Listing Processing Time: 1.59629416466 sec
<minosgpvm01.fnal.gov> jobsub_history --jobsub-server https://fifebatch-preprod.fnal.gov:8443 --group=fermilab
Server response code: 200
Response OUTPUT:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
456.0@fermicloud383.fnal.gov gerard1 10/16 09:19 0+00:00:00 C 0 0.0 hostname_20141016_091923_27996_0_1_wrap.sh
332.0@fermicloud383.fnal.gov gerard1 10/08 06:03 0+00:00:00 C 0 0.0 hostname_20141008_060325_13462_0_1_wrap.sh
322.0@fermicloud383.fnal.gov gerard1 10/07 11:14 0+00:00:00 X 0 0.0 hostname_20141007_111459_23985_0_1_wrap.sh
314.0@fermicloud383.fnal.gov neha 09/24 10:27 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102739_27530_0_1_wrap.sh
313.0@fermicloud383.fnal.gov neha 09/24 10:27 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102738_27391_0_1_wrap.sh
312.0@fermicloud383.fnal.gov neha 09/24 10:27 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102737_27246_0_1_wrap.sh
311.0@fermicloud383.fnal.gov neha 09/24 10:26 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102643_26945_0_1_wrap.sh
310.0@fermicloud383.fnal.gov neha 09/24 10:26 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102642_26802_0_1_wrap.sh
309.0@fermicloud383.fnal.gov neha 09/24 10:26 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102613_26589_0_1_wrap.sh
308.0@fermicloud383.fnal.gov neha 09/24 10:26 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102605_26422_0_1_wrap.sh
307.0@fermicloud383.fnal.gov neha 09/24 10:21 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102120_24902_0_1_wrap.sh
112.0@fermicloud383.fnal.gov neha 08/20 22:48 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140820_224823_5582_0_1_wrap.sh
Remote Listing Processing Time: 1.29469394684 sec
<minosgpvm01.fnal.gov> jobsub_history --jobsub-server https://fifebatch-preprod.fnal.gov:8443 --group=fermilab
Server response code: 200
Response OUTPUT:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
424.0@fermicloud391.fnal.gov gerard1 10/16 09:19 0+00:00:00 C 0 0.0 hostname_20141016_091925_6149_0_1_wrap.sh
301.0@fermicloud391.fnal.gov gerard1 10/07 11:29 0+00:00:00 X 0 0.0 hostname_20141007_112909_7002_0_1_wrap.sh
Remote Listing Processing Time: 1.65214085579 sec
<minosgpvm01.fnal.gov> jobsub_history --jobsub-server https://fifebatch-preprod.fnal.gov:8443 --group=fermilab
Server response code: 200
Response OUTPUT:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
290.0@fermicloud393.fnal.gov gerard1 10/16 08:41 0+00:00:00 C 0 0.0 hostname_20141016_084116_25477_0_1_wrap.sh
178.0@fermicloud393.fnal.gov gerard1 10/08 06:02 0+00:00:00 X 0 0.0 hostname_20141008_060241_16931_0_1_wrap.sh
167.0@fermicloud393.fnal.gov gerard1 10/07 11:10 0+00:00:00 X 0 0.0 id_20141007_111036_23839_0_1_wrap.sh
166.0@fermicloud393.fnal.gov gerard1 10/07 10:51 0+00:00:00 X 0 0.0 id_20141007_105148_19881_0_1_wrap.sh
Remote Listing Processing Time: 1.54917097092 sec
<minosgpvm01.fnal.gov> jobsub_history --jobsub-server https://fifebatch-preprod.fnal.gov:8443 --group=fermilab
Server response code: 200
Response OUTPUT:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
456.0@fermicloud383.fnal.gov gerard1 10/16 09:19 0+00:00:00 C 0 0.0 hostname_20141016_091923_27996_0_1_wrap.sh
332.0@fermicloud383.fnal.gov gerard1 10/08 06:03 0+00:00:00 C 0 0.0 hostname_20141008_060325_13462_0_1_wrap.sh
322.0@fermicloud383.fnal.gov gerard1 10/07 11:14 0+00:00:00 X 0 0.0 hostname_20141007_111459_23985_0_1_wrap.sh
314.0@fermicloud383.fnal.gov neha 09/24 10:27 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102739_27530_0_1_wrap.sh
313.0@fermicloud383.fnal.gov neha 09/24 10:27 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102738_27391_0_1_wrap.sh
312.0@fermicloud383.fnal.gov neha 09/24 10:27 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102737_27246_0_1_wrap.sh
311.0@fermicloud383.fnal.gov neha 09/24 10:26 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102643_26945_0_1_wrap.sh
310.0@fermicloud383.fnal.gov neha 09/24 10:26 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102642_26802_0_1_wrap.sh
309.0@fermicloud383.fnal.gov neha 09/24 10:26 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102613_26589_0_1_wrap.sh
308.0@fermicloud383.fnal.gov neha 09/24 10:26 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102605_26422_0_1_wrap.sh
307.0@fermicloud383.fnal.gov neha 09/24 10:21 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140924_102120_24902_0_1_wrap.sh
112.0@fermicloud383.fnal.gov neha 08/20 22:48 0+00:00:00 C 0 0.0 testjob-gwms1.sh_20140820_224823_5582_0_1_wrap.sh
Remote Listing Processing Time: 1.27576994896 sec
<minosgpvm01.fnal.gov>

Gerard

#2 Updated by Parag Mhashilkar about 6 years ago

  • Assignee set to Dennis Box
  • Target version set to v1.0.2

#3 Updated by Parag Mhashilkar about 6 years ago

  • Assignee changed from Dennis Box to Parag Mhashilkar

#4 Updated by Parag Mhashilkar about 6 years ago

Done and merged to master

#5 Updated by Parag Mhashilkar about 6 years ago

  • Status changed from New to Resolved

#6 Updated by Parag Mhashilkar about 6 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF