Project

General

Profile

Bug #12642

some submissions ignoring INDOWNTIME attribute

Added by Dennis Box over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
05/16/2016
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

I am back from vacation, I see that the INDOWNTIME attribute was scheduled set to True on fifebatch1 on 5/11. In general it seems to work, but two users have managed to submit jobs to fifebatch1 since then.

User kleykamp submitted jobs to fifebatch1 on 5/12 using client v1.1.7, some of them are running.
User zqhong was able to submit some jobs today, 5/16 using client v1.2. All of these are idle.

sample keyclamp wrapfile header:
  1. /fife/local/scratch/uploads/minerva/kleykamp/2016-05-12_015357.511333_2448/GaudiAnaStage_00117204_000_minerva_20160512_021053_2540795_0_1_wrap.sh
  2. Automatically generated by:
  3. jobsub -e JOBSUBJOBSECTION --lines +JobsubJobSection="4716" -e IFDH_DEBUG -e IFDH_CP_MAXRETRIES -e IFDH_VERSION -e IFDHC_VERSION --resource-provides=usage_model=DEDICATED,OPPORTUNISTIC -e IFDH_DEBUG -e IFDH_CP_MAXRETRIES -e IFDH_VERSION -e IFDHC_VERSION --mail_on_error --OS=SL6 --memory=3GB --group=minerva --prefix GaudiAnaStage_00117204_000_minerva -f /pnfs/minerva/mc_reconstructed/mc-reco-pool/mc/v10r8p6/00/11/72/04/SIM_minerva_00117204_4977_Reco_v10r8p4_v10r8p6.root -f /pnfs/minerva/persistent/users/minervadat/production_inputs/ppfx/ppfx_v1r8.tar.gz -d HIST /minerva/data/users/kleykamp/NukeMoreInfoV3/grid/central_value/minerva/hist/v10r8p9/00/11/72/04/ -d POT /minerva/data/users/kleykamp/NukeMoreInfoV3/grid/central_value/minerva/pot/v10r8p9/00/11/72/04/ -d LOG /minerva/data/users/kleykamp/NukeMoreInfoV3/grid/central_value/minerva/logfiles/v10r8p9/00/11/72/04/ -d OPTS /minerva/data/users/kleykamp/NukeMoreInfoV3/grid/central_value/minerva/opts/v10r8p9/00/11/72/04/ -d ANA /minerva/data/users/kleykamp/NukeMoreInfoV3/grid/central_value/minerva/ana/v10r8p9/00/11/72/04/ -r v10r8p9 -i /grid/fermiapp/minerva/software_releases/v10r8p9 -t /minerva/app/users/kleykamp/cmtuser/Minerva_v10r8p9/Tools/ProductionScripts --cmtconfig x86_64-slc6-gcc44-opt -l +JobsubParentJobId="$(DAGManJobId)@fifebatch1.fnal.gov" -l +JobsubJobId="$(CLUSTER).$(PROCESS)@fifebatch1.fnal.gov" -l +Owner="kleykamp" -e JOBSUBPARENTJOBID -l +JobsubServerVersion="1.2.2" -l +JobsubClientVersion="1.1.7" -l +JobsubClientDN="/DC=gov/DC=fnal/O=Fermilab/OU=People/CN=Jeffrey Kleykamp/CN=UID:kleykamp" -l +JobsubClientIpAddress="131.225.67.69" -l +JobsubClientKerberosPrincipal="" ./GaudiRunProcessor.py --opts /minerva/data/users/kleykamp/NukeMoreInfoV3/grid/central_value/minerva/opts_in/v10r8p9/00/11/72/04//MCAna_Run_00117204_Master_v10r8p9-kleykamp_Job000.opts --input /pnfs/minerva/mc_reconstructed/mc-reco-pool/mc/v10r8p6/00/11/72/04/SIM_minerva_00117204_4977_Reco_v10r8p4_v10r8p6.root --ana_tool NukeCCQETwoTrack
sample zqhong wrap file header:
  1. /fife/local/scratch/uploads/cdms/zqhong/2016-05-16_095740.521736_2491/run_supersimplot.sh_20160516_095741_1572909_0_1_wrap.sh
  2. Automatically generated by:
  3. jobsub -l +JobsubClientDN="/DC=gov/DC=fnal/O=Fermilab/OU=People/CN=Ziqing Hong/CN=UID:zqhong" -l +JobsubClientIpAddress="131.225.202.40" -l +Owner="zqhong" -l +JobsubServerVersion="1.2.2" -l +JobsubClientVersion="1.2" -l +JobsubClientKerberosPrincipal="" --generate-email-summary --email-to= --expected-lifetime=long --resource-provides=usage_model=DEDICATED,OPPORTUNISTIC --OS=SL6 -dSUPERSIMOUT /pnfs/cdms/scratch/zqhong/GridOutput3/noShield_g03_sphereish/plot/ --tar_file_name /fife/local/scratch/dropbox/cdms/zqhong/47c6bd6f1e41a574e0ba897a04dbf3cc8e28dc27/macros.tgz /fife/local/scratch/uploads/cdms/zqhong/2016-05-16_095740.521736_2491/run_supersimplot.sh noShield_g03_sphereish

Investigate why this happened.
We may want to consider setting MAX_JOBS_SUBMITTED=0 on fifebatch1 if this cannot be fixed.
Dennis

History

#1 Updated by Neha Sharma over 4 years ago

INDOWNTIME was set on 5/12 (after kleykamp jobs came in)

Also, another job came in just now
6732079.0 dfitz11 5/16 14:28 0+00:00:00 I 0 0.3 condor_dagman -p 0

#2 Updated by Dennis Box over 4 years ago

  • Status changed from New to Resolved

Released client v1_2_1_4_rc7 to /grid/fermiapp as 'test'. This fixes the bug.

If any snow tickets come in saying they get a submission failure with a MAX_JOBS_SUBMITTED error, they should switch to the above client and re-submit, i.e.

setup -t jobsub_client
jobsub_submit yada yada yada

#3 Updated by Dennis Box over 4 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF