jobsub_lite: feature request - have jobs transfer back stdout and stderr if they go held for easier troubleshooting
See if jobsub_lite can implement the feature where jobs that go held lose stdout and stderr (can we transfer them back before that?) More details in:
From Ray Culbertson, a good summary of what they'd like:
In the good old days, on CDF, when a job crashed or went to hold
there was a tarball of everything in the working dir returned.
If the tarball was over 1GB, then the script would drop files starting with the
largest, until a small tarball could be made. If the err/out files themselves were too
big it would return the first and last 10k lines of the log, something like that.
This was all designed and implemented by Joe Boyd and Stephan Lammel in ~2005.
It was quite disappointing to see this technology go backwards when I joined Mu2e.
In the Mu2e case, we don't usually return everything in the dir, but returning
the out and err would be very helpful. Every new user asks how they can get information
on their held/crashed jobs and we have to say you can't. I've been asking for this since 2015
and I include it when FIFE asks for input to a FIFE meeting.
So I also strongly support this request.