Project

General

Profile

Bug #17604

Provide better error message when condor_q fails

Added by Kevin Retzke almost 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
02/15/2016
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

I'm re-opening #11734, since it doesn't appear to have been solved at all, or has returned (see INC000000881322). Users still get a python traceback, and then a rather useless condor error when the schedd doesn't respond to a condor_q request.

Release notes say this was solved in server v1.2.1:

* Bug      11734  Propagate schedd errors from condor_q back to client without     
                  python exception traceback noise

In addition to not showing users tracebacks, Jobsub should intercept common errors and provide users with a more useful message, e.g. "The Jobsub servers are too busy to process this request right now, please try again in a few minutes."

Original issue follows.

From INC000000664793:

<dunegpvm06> jobsub_q --user=lebrun
Traceback (most recent call last):
  File "/opt/jobsub/server/webapp/condor_commands.py", line 217, in ui_condor_q
    jobs, cmd_err = subprocessSupport.iexe_cmd(cmd)
  File "/opt/jobsub/server/webapp/subprocessSupport.py", line 82, in iexe_cmd
    raise CalledProcessError(exitStatus, cmd, output="\nEXITCODE:%s\nSTDOUT:%s\nSTDERR:%s" % (exitStatus, stdoutdata, stderrdata))
CalledProcessError: Command 'condor_q -name fifebatch2.fnal.gov  -format '%-37s' 'regexps("((.+)\#(.+)\#(.+))",globaljobid,"\3@\2 ")'  -format ' %-14s ' Owner   -format ' %-11s ' 'formatTime(QDate,"%m/%d %H:%M")'  -format '%3d+'  'int( ifthenelse(JobStartDate=?=UNDEFINED,0,ifthenelse(CompletionDate==0,ServerTime-JobStartDate,CompletionDate-JobStartDate)) /(3600*24))'  -format '%02d'  'int( ifthenelse(JobStartDate=?=UNDEFINED,0,ifthenelse(CompletionDate==0,ServerTime-JobStartDate,CompletionDate-JobStartDate)) /3600)-int(24*INT( ifthenelse(JobStartDate=?=UNDEFINED,0,ifthenelse(CompletionDate==0,ServerTime-JobStartDate,CompletionDate-JobStartDate)) /(3600*24)))'  -format ':%02d'  'int( ifthenelse(JobStartDate=?=UNDEFINED,0,ifthenelse(CompletionDate==0,ServerTime-JobStartDate,CompletionDate-JobStartDate)) /60)-int(60*INT(INT( ifthenelse(JobStartDate=?=UNDEFINED,0,ifthenelse(CompletionDate==0,ServerTime-JobStartDate,CompletionDate-JobStartDate)) /60)/60))'  -format ':%02d'  ' ifthenelse(JobStartDate=?=UNDEFINED,0,ifthenelse(CompletionDate==0,ServerTime-JobStartDate,CompletionDate-JobStartDate)) -int(60*int( ifthenelse(JobStartDate=?=UNDEFINED,0,ifthenelse(CompletionDate==0,ServerTime-JobStartDate,CompletionDate-JobStartDate)) /60))'  -format ' %-2s'  'ifThenElse(JobStatus==0,"U",ifThenElse(JobStatus==1,"I",ifThenElse(TransferringInput=?=True,"<",ifThenElse(TransferringOutput=?=True,">",ifThenElse(JobStatus==2,"R",ifThenElse(JobStatus==3,"X",ifThenElse(JobStatus==4,"C",ifThenElse(JobStatus==5,"H",ifThenElse(JobStatus==6,"E",string(JobStatus))))))))))'  -format '%3d ' JobPrio   -format ' %4.1f ' ImageSize/1024.0   -format '%-30s ' 'regexps(".*\/(.+)",cmd,"\1")'  -format '\n' Owner     -constraint 'True && Owner=="lebrun" && True && True' ' returned non-zero exit status 1: 
EXITCODE:1
STDOUT:
STDERR:
-- Failed to fetch ads from: <131.225.67.139:9615?addrs=131.225.67.139-9615&noUDP&sock=1554778_02de> : fifebatch2.fnal.gov
SECMAN:2007:Failed to end classad message.

<dunegpvm06>

 ~ 5 min. later, 

<dunegpvm06> jobsub_q --user=lebrun
JOBSUBJOBID                           OWNER           SUBMITTED     RUN_TIME   ST PRI SIZE CMD

<dunegpvm06> 

This is a lot of noise the user shouldn't have to wade through due to a failure in the inner workings (from the user's perspective) of JobSub.


Related issues

Copied from JobSub - Bug #11734: Provide better error message when condor_q failsClosed2016-02-15

History

#1 Updated by Kevin Retzke almost 2 years ago

  • Copied from Bug #11734: Provide better error message when condor_q fails added

#2 Updated by Dennis Box almost 2 years ago

  • Target version set to v1.2.5

#3 Updated by Dennis Box almost 2 years ago

  • Status changed from New to Resolved
  • Target version changed from v1.2.5 to v1.2.4.1

#4 Updated by Dennis Box almost 2 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF