Support regex in jobids for certain client commands
Please forward this to firstname.lastname@example.org, if need be. Perhaps the e-mail or contact list for FermiGrid Support overlaps email@example.com
I submitted about 100 jobs yesterday late afternoon, none of them did I wanted, but the specific failure is not the issue I'd like to discuss in this ticket. Thank God, the failure is, I suspect, the same for all the jobs.
I do not know this for sure, as I have now to tediously "jobsub_fetchlog", untar about 100 times. (Tedious is the key word here. Sorry for moaning..)
In the previous incarnation of FermiGrid job submission, all the log files, err files, ".out" files, etc, came back in a specific directory, and the user could write a log-sniffer script, or program to list all the files in the directory, and grep what he needs from such files.
So, I naively did
jobsub_fetchlog --group=minerva --jobid=842*.0\@fifebatch1.fnal.gov
as I knew that most the jobids started with 842. It did two things: First, it reminded, very poletely, that I could retrieve information about an impressive list of jobs:
For user lebrun, accounting group minerva, the server can retrieve information for these job_ids:
Nice of him!. It them, again, very politely, that he could not find the jobid842*\@fifebatch1.fnal.gov
In a sense, I got what I want: the list of unfetch jobs. I can now start witing a script that parse the above output, and execute a jobsub_fetch for each of them
Is there a better way to get all these tar files?
#1 Updated by Parag Mhashilkar over 4 years ago
Additional input from Paul ...
Hi Parag, Pardon me, but I'd like to have a look at everyone of them! Of course, I don't plan to do this by hand. I would write a script that checks for critical phases, completion messages, final tally, etc.. Then discard the error file, or even the log file.. Perhaps we can think of a new jobsub_fetchlogAndCheck script, written in Python, as it seems the preferred scripting language, nowadays, the would (i) Get a list of completed jobs that haven't been fetched yet, for a given user, a given group id. (lebrun, minerva). All of them. Not just one or two. (ii) Fetch them by one by one, in a working directory (set by the user) (iii) For each file type, it would execute a user-written python function that greps for specific message in the log (or .out, or .err), return a boolean to the jobsub_fetchLogAndCheck, depending on whether or not the job is accepted. (iv) Tally the number of accepted jobs. The FIFE crack team would write (i), (ii), (iv), and the user would write (iii) . In my case, I would simply look at a few Geant 4 warning or error messages, and the final completion message of g4numi (or g4lbne). Note that we copy the detailed output (a root file) elsewhere, but a generic python routine, again written by the user, would "handle output" Thanks Paul On 11/12/2014 02:27 PM, Parag Mhashilkar wrote: Hi Paul, This is useful feedback. We understand that using fetch log is not always convenient and we want to simplify things for the end user in future releases. There is a challenge … Supporting regex in job ids can be expensive and can cause extreme loads on the server. Just for the example below, there is a very good chance for human error and possibility that --jobid=842*.0\@fifebatch1.fnal.gov could potentially match to 100 - 1000s of jobs and you may only need a few of them. If the regexs like these are supported, once the command is issued, server will identify all the available sandboxes matching the pattern and start tarring+gziping them. Both these tasks are cpu intensive and we also want to avoid such situations if possible.