Project

General

Profile

Bug #11453

Frontend policies do not work with RequestCpus in classad is a condor expression

Added by Parag Mhashilkar over 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
High
Assignee:
Parag Mhashilkar
Category:
-
Target version:
Start date:
01/20/2016
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:

CMS, FIFE

Duration:

Description

Frontend policies evaluates the RequestCpus after converting it to int. If the RequestCpus is a condor expression in the classad, value returned by condor_q is a string and not an integer. This causes frontend to throw an exception.

[2016-01-20 15:38:28,346] DEBUG: glideinFrontendLib:340: There were 1 exceptions in countMatch subprocess. Most recent traceback: ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.6/site-packages/glideinwms/frontend/glideinFrontendLib.py", line 310, in countMatch\n if eval(match_obj):\n', ' File "<string>", line 1, in <module>\n', "ValueError: invalid literal for int() with base 10: 'ifThenElse(JobUniverse != 7,1,1)'\n"]


Related issues

Blocks GlideinWMS - Feature #11877: More flexible mechanisms for a job to request resources (cpus, memory, ...)Closed03/03/2016

History

#1 Updated by Parag Mhashilkar over 4 years ago

  • Target version changed from v3_2_12_1 to v3_2_13

#2 Updated by Parag Mhashilkar over 4 years ago

  • Target version changed from v3_2_13 to v3_2_14

#3 Updated by Parag Mhashilkar about 4 years ago

  • Blocks Feature #11877: More flexible mechanisms for a job to request resources (cpus, memory, ...) added

#4 Updated by Parag Mhashilkar about 4 years ago

  • Stakeholders updated (diff)

#5 Updated by Parag Mhashilkar about 4 years ago

  • Status changed from New to Feedback
  • Assignee changed from Parag Mhashilkar to Marco Mascheroni
  • Stakeholders updated (diff)

Code changes to use htcondor python bindings is in branch v3/11453 and ready for review.

#6 Updated by Marco Mascheroni about 4 years ago

  • Assignee changed from Marco Mascheroni to Parag Mhashilkar

I added a commit to the issue branch that fixes some trailing whitespaces (I use a pre-commit hook for that, let me know if you want me to share it).

I also added a TODO in the code about adding some warning messages if the python bindings are not available. Feel free to remove the TODO if you think it is irrelevant/wrong.

The thing I am not convinced is that there is a total lack of exception management in the fetch_using_bindings functions. The equivalent fetch_using_exe throw ExeError (see iexe_cmd), so we are in a situation where if you use condor commands you throw ExeError, if you use the python bindings you throw (IIRC) RuntimeError. Also, in iexe we print a lot of details about the errors which are swallowed for example in getCondorQConstrained and getCondorStatusConstrained.

My suggestion is to create a base QueryError exception, and ExeError and PBError that inherits from QueryError, and then turn condorExe.ExeError to QueryError where it is used (I'd start with getCondorQConstrained and getCondorStatusConstrained which are the most important ones, but grepping there are other places where it is used). We also want to rpint details about the failures before throwing QueryError.

#7 Updated by Parag Mhashilkar about 4 years ago

  • Assignee changed from Parag Mhashilkar to Marco Mascheroni

Improved exception handling as per the discussions. Let me know if you find anything else.

#8 Updated by Marco Mascheroni about 4 years ago

  • Assignee changed from Marco Mascheroni to Parag Mhashilkar

#9 Updated by Marco Mascheroni about 4 years ago

Few other comments have been discussed in private

#10 Updated by Parag Mhashilkar about 4 years ago

  • Status changed from Feedback to Resolved

Fixed few issues after talking to Marco and more testing. Merged to branch_v3_2

#11 Updated by Parag Mhashilkar about 4 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF