Project

General

Profile

Feature #24815

jobsub_lite: feature request - have jobs transfer back stdout and stderr if they go held for easier troubleshooting

Added by Shreyas Bhat 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
08/18/2020
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

See if jobsub_lite can implement the feature where jobs that go held lose stdout and stderr (can we transfer them back before that?) More details in:
INC000001102671

---
From Ray Culbertson, a good summary of what they'd like:

In the good old days, on CDF, when a job crashed or went to hold
there was a tarball of everything in the working dir returned.
If the tarball was over 1GB, then the script would drop files starting with the
largest, until a small tarball could be made. If the err/out files themselves were too
big it would return the first and last 10k lines of the log, something like that.
This was all designed and implemented by Joe Boyd and Stephan Lammel in ~2005.
It was quite disappointing to see this technology go backwards when I joined Mu2e.
In the Mu2e case, we don't usually return everything in the dir, but returning
the out and err would be very helpful. Every new user asks how they can get information
on their held/crashed jobs and we have to say you can't. I've been asking for this since 2015
and I include it when FIFE asks for input to a FIFE meeting.
So I also strongly support this request.

Also available in: Atom PDF