Project

General

Profile

Bug #2692

condor_q output may have changed in Condor 7.9pre

Added by Burt Holzman over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Low
Assignee:
Douglas Strain
Category:
-
Target version:
Start date:
05/02/2012
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

We should be proactive on this so we will work with 7.9.x out of the box.

From Derek:
---
Hello,

I'm running a pre-release of condor 7.9. The specific RPM is: https://koji.hep.caltech.edu/koji/buildinfo?buildID=773

I believe something is breaking the 'condor_q' command the frontend runs.

The output of the debug log:
[2012-05-02T11:46:38-05:00 2165299] Failed to retrieve jobs state from the subprocess:
[2012-05-02T11:47:48-05:00 2165299] Failed to retrieve jobs state from the subprocess:
[2012-05-02T11:49:05-05:00 2165299] Failed to retrieve jobs state from the subprocess:
[2012-05-02T11:50:16-05:00 2165299] Failed to retrieve jobs state from the subprocess:

From the info log:
[2012-05-02T11:53:37-05:00 2165299] Iteration at Wed May 2 11:53:37 2012
[2012-05-02T11:53:37-05:00 2165299] Querying schedd, entry, and glidein status using child processes.
[2012-05-02T11:53:46-05:00 2165299] WARNING: Failed to retrieve jobs state information from the subprocess.
[2012-05-02T11:53:46-05:00 2165299] WARNING: Missing schedd, factory entry, and/or current glidein state information. Unable to calculate required glideins, terminating loop.
[2012-05-02T11:53:46-05:00 2165299] Writing stats
[2012-05-02T11:53:46-05:00 2165299] Sleep

Not very helpful logging. This only happens if there is a job for the group. If there are no jobs in queue with the job query expression = true, it seems to work (ie, sees 0 jobs, advertises 0 needed jobs to the factory).

-Derek

0001-Adding-logging-for-exceptions-when-running-in-the-su.patch (3.52 KB) 0001-Adding-logging-for-exceptions-when-running-in-the-su.patch Untested patch against branch_v2plus Derek Weitzel, 05/03/2012 11:31 AM

History

#1 Updated by Douglas Strain over 7 years ago

Derek and I tracked this down. Apparently in this development version, the condor_q returns XML that has "ProcID" instead of "ProcId". This causes an exception in our parsing.

1) I can't imagine that this was done on purpose. Would it be possibly for someone at CondorWeek to mention this to the condor team so it can be fixed before a proper release?

2) Derek also suggested we have more extensive logging in the subprocess code where this forks to do condor_q and condor_status, since, if the forked process fails, there is no reason given, and it is a pain to track down.

I think that #2 should be done as part of this ticket.

#2 Updated by Douglas Strain over 7 years ago

As part of this ticket, case-sensitivity should also be addressed, at least for condor_q and condor_status parsing.

#4 Updated by Derek Weitzel over 7 years ago

Adding (untested) patch for some simple logging that would have caught this error.

#5 Updated by Douglas Strain over 7 years ago

  • Status changed from Assigned to Resolved

I have committed Derek's patch. I have also made list2dict (the problem function in this case) resistant to changes in capitalization. The changes are in branch_v2plus_2692 and branch_master_2692

#6 Updated by Derek Weitzel over 7 years ago

Because it's helpful in the future:
Logging commit:726342ce7
Case insensitive commit:90dd969a5

#7 Updated by Douglas Strain over 7 years ago

Due to problems in the attribution and message, I went ahead and re-committed this patch in a new branch: branch_v2plus_2692_try2. Please use this one for reviewing and merging.

New commit numbers are as follows:
commit:56f5abd2fd9401fd79af81b79d64f2223062f912 list2dict: getting rid of a redundant condition taken care of above
commit:fc308e86234487c7909c34f3cb97060c885fe2bb list2dict now can handle case insensitive requests during xml parsing
commit:255a19a23f5507bd3c83e3af45f5150457e3e8f2 Adding logging for exceptions when running in the subprocesses. From ticket #2692

#8 Updated by Parag Mhashilkar over 7 years ago

  • Target version changed from v2_7_x to v2_6

#9 Updated by Parag Mhashilkar over 7 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF