Project

General

Profile

Bug #2904

Feature #2455: Classification of glidein failure modes

Fix remaining issues with #2455

Added by Igor Sfiligoi over 7 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Low
Category:
-
Target version:
Start date:
09/14/2012
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Spent time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

There are still some outstanding problems of #2455 implementation.


Subtasks

Feature #2964: Cleanly separate the XML handling code in glidein_startup.shNew

History

#1 Updated by Igor Sfiligoi over 7 years ago

  • % Done changed from 0 to 50

The version we have in v2_6_1 is essentially a noop; the printout was broken.

I have committed a fix (plus some cleanup) into
branch_v2plus_igor_2904

Plan to clean up a few more error messages before declaring it complete.

commit b52e8c728a57efccc8a21ddf43187fd443d02e6b
Author: Igor Sfiligoi <>
Date: Thu Aug 23 17:24:24 2012 -0700

Properly use error_augment in glidein_startup. Fix a few minor problems with error_gen and error_augment as well

commit 47290a7e2a3b9d7c3cd91a97cae01df40da63926
Author: Igor Sfiligoi <>
Date: Thu Aug 23 15:08:48 2012 -0700

Move multiline handling outside error_gen. The tool should not just blindly interpret any special characters provided by the caller

commit 2c4234ee3d0b4584cd872bf6778454ceece2c9a2
Author: Igor Sfiligoi <>
Date: Thu Aug 23 12:40:33 2012 -0700

Rename xml_parse.sh into error_augment.sh; old name was not representative. Also change the interface to be more useful

#2 Updated by Igor Sfiligoi over 7 years ago

  • Status changed from Assigned to Feedback
  • Assignee changed from Igor Sfiligoi to Anthony Tiradani
  • % Done changed from 50 to 90

Cleaned up the metrics of the validation scripts.

Also improved glidein_startup; now it reports its own failures in the XML format.
It also propagates up the metrics of the last script.

The XML itself is now reported twice:
  • a simplified version, but still containing the interesting info, is printed in stdout in clear
  • the complete XML file is compressed and printed to stderr

Below are the commits.
I also created the tag
branch_v2plus_igor_2904_v1
that points to this version.

I don't envision more changes in the short term.

Please review.

commit b99191bc19a7728669bdca04b42f0f36d40ad40c
Author: Igor Sfiligoi <>
Date: Sat Aug 25 11:18:29 2012 -0700

Add proper metrics to condor_startup. Propagate them to the final XML produced by glidein_startup

commit a4a9ddfcd6af6881a43f8213b9ec88961b17182b
Author: Igor Sfiligoi <>
Date: Fri Aug 24 22:28:36 2012 -0700

Improve metrics for the standard validation scripts

commit 87e6bf61e1b4bf420c367af09d9495cd1f4eb801
Author: Igor Sfiligoi <>
Date: Fri Aug 24 21:08:33 2012 -0700

Add proper XML reports for file staging errors, too. Had to change the glidein_exit as well, as I now cannot rely on the helper functions ...

commit d9bf6e63fd06b1c4444658297d99a253ece9dcd4
Author: Igor Sfiligoi <>
Date: Fri Aug 24 15:13:30 2012 -0700

Split the XML file between stdout and stderr. Stdout gets the summary version, while the stderr gets the complete version in compressed form...

#3 Updated by Parag Mhashilkar over 7 years ago

  • Target version changed from v2_7_x to v2_6_2

#4 Updated by Igor Sfiligoi over 7 years ago

Jeff asked for a few additional features in cat_XMLResult:
  • Support for multiple files (so he can use it with xargs)
  • Parsing from the stdout (so he can easily get the short version)

Committed as
commit d072bae6783fe9df6e6aa7a1f94672207c6e1acd
Author: Igor Sfiligoi <>
Date: Thu Aug 30 13:05:12 2012 -0700

mmap fails on empty files. Add the needed protection

commit b8b382f53744729f95eabc78634383848d30290e
Author: Igor Sfiligoi <>
Date: Thu Aug 30 12:53:53 2012 -0700

Add missing XML header when aggregating multiple XML files

commit 2e1f0e9f1598b5c6d525e6b7c960b9b55cf729f9
Author: Igor Sfiligoi <>
Date: Thu Aug 30 12:51:06 2012 -0700

Add support for extracting the XML from stdout file, too

commit d72d4c41a5583589bd5ff67b98184ecaa88737b1
Author: Igor Sfiligoi <>
Date: Thu Aug 30 12:32:08 2012 -0700

Add support for multiple files

#5 Updated by Igor Sfiligoi over 7 years ago

We have been running the abov for some time at UCSD now.
The OSG gfactory operators have thus found two problems:
  • glexec results were missing important bits of information on error
  • the XML output would not be produced if the gldiein failed early in its life (e.g. disk problems)

Furthermore, while trying to document the XML output, I noticed that the file names were too generic, and could easily be overwritten by mistake. So I renamed them.

All commits went to branch_v2plus_igor_2904:

commit 57b791dd72fc778dec9eebc699e15f10294c1739
Author: Igor Sfiligoi <>
Date: Fri Sep 7 14:16:05 2012 -0700

Create final XML output even if the glidein fails early on. That required some refactoring of the code

commit 72c8923e0c937d02f1c8a739b4743b8462d1a683
Author: Igor Sfiligoi <>
Date: Thu Sep 6 16:43:07 2012 -0700

Use more significant names for the XML files. Before it was just "output", which is too generic and could have been overwritten by a test by mistake.

commit ed591da1c01d1119a1d10c74ea7d56af7520c2b1
Author: Igor Sfiligoi <>
Date: Thu Aug 30 18:14:13 2012 -0700

glexec_startup: Save the stderr in the XML output on error

#6 Updated by Igor Sfiligoi about 7 years ago

Found out that the XML format was not conforming with the proposed OSG formatting.
So I had to tweak the code once more.

Since I was at it, I also added an explicit
exit 0
for all scripts that report success.
This would allow a new FE to talk to an old factory.
(although error reporting would still be less than optimal in that case)

Committed to branch_v2plus_igor_2904 as usual.

Hopefully this is the last time I have to touch the code.

commit df863654f04e8a84177b11fd4a3198a65a94ff46
Author: Igor Sfiligoi <>
Date: Thu Sep 13 14:31:23 2012 -0700

Always follow errorgen -ok with an exit 0

commit 35812d3198725eec3a798f0c5259f35852a76efb
Author: Igor Sfiligoi <>
Date: Thu Sep 13 14:28:13 2012 -0700

Fix the XML output; detail must be on its own, not inside result

#7 Updated by Anthony Tiradani about 7 years ago

  • Status changed from Feedback to Resolved

I talked with Igor. There were 2 potential issues.

1) There is now a conditional dependency on Python. Specifically, if base64 is not installed, then it uses a custom python function to perform the uuencode tasks. (I was under the impression that we only required shell.)

From Igor (paraphrased): we require python to be on the worker so this isn't a problem

2) The xml code is intermingled with the logic in glidein_startup.sh. The comment states that this is done this since the helper functions may not have been downloaded yet. Why not test for the existence of the helper functions and use them if they are available, otherwise use the hard coded xml? More importantly for maintenance, I'd like to see the xml pieces stripped out and put into functions. This way, if the schema changes it is easy to find all the places that need to change.

From Igor (paraphased): If the xml code has to be there no matter what why not just use it. - Agreed (Tony)

We both agreed that the function split shouldn't be required to move forward and will happen at a later date.

#8 Updated by Igor Sfiligoi about 7 years ago

Merged into both branch_v2plus and master.

I expect branch_v2plus to work just fine, since the branchpoint was not too far in the past, and branch_v2plus_igor_2904 was extensively tested.

It is likely master will work as well, but I did not have the chance to test it.

branch_v2plus:
commit 4674fc2208e0a45b29620599fb336e49914d7633
Merge: b77c7bc df86365
Author: Igor Sfiligoi <>
Date: Fri Sep 14 12:50:49 2012 -0700

Merge branch 'branch_v2plus_igor_2904' into branch_v2plus

master:

commit 1578e6e6a30ede2644cb0e7341a5b4e880604c3d
Author: Igor Sfiligoi <>
Date: Fri Sep 14 13:41:09 2012 -0700

Merge "branch_v2plus_igor_2904" into "master" by manual patch.
(this time for real)

commit 12ab34bb810c1236c945445dc9094c7b6a321cef
Author: Igor Sfiligoi <>
Date: Fri Sep 14 13:33:47 2012 -0700

Merge "branch_v2plus_igor_2904" into "master" by manual patch.

#9 Updated by Parag Mhashilkar about 7 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF