Feature #7313

Support regex in jobids for certain client commands

Added by Parag Mhashilkar over 5 years ago. Updated about 1 month ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:


Please forward this to , if need be. Perhaps the e-mail or contact list for FermiGrid Support overlaps
I submitted about 100 jobs yesterday late afternoon, none of them did I wanted, but the specific failure is not the issue I'd like to discuss in this ticket. Thank God, the failure is, I suspect, the same for all the jobs.
I do not know this for sure, as I have now to tediously "jobsub_fetchlog", untar about 100 times. (Tedious is the key word here. Sorry for moaning..)

In the previous incarnation of FermiGrid job submission, all the log files, err files, ".out" files, etc, came back in a specific directory, and the user could write a log-sniffer script, or program to list all the files in the directory, and grep what he needs from such files. 
So, I naively did
jobsub_fetchlog --group=minerva --jobid=842*.0\

as I knew that most the jobids started with 842. It did two things: First, it reminded, very poletely, that I could retrieve information about an impressive list of jobs:


For user lebrun, accounting group minerva, the server can retrieve information for these job_ids:
<a href="../..//sandbox/"></a>
<a href="../..//sandbox/"></a>
<a href="../..//sandbox/"></a>
<a href="../..//sandbox/"></a>

Nice of him!.  It them, again, very politely, that he could not find the jobid842*\

In a sense, I got what I want: the list of unfetch jobs. I can now start witing a script that parse the above output, and execute a jobsub_fetch for each of them
Is there a better way to get all these tar files?


Related issues

Blocked by JobSub - Necessary Maintenance #22164: Upgrade jobsub code to use argparse instead of optparseClosed03/19/2019


#1 Updated by Parag Mhashilkar over 5 years ago

Additional input from Paul ...

Hi Parag,

   Pardon me, but I'd like to have a look at everyone of them! Of course, I don't plan to do this by hand.  I would write a script that checks for critical phases, completion messages, final tally, etc.. Then discard the error file, or even the log file..

  Perhaps we can think of a new jobsub_fetchlogAndCheck script, written in Python, as it seems the preferred scripting language, nowadays,
the would
(i)  Get a list of completed jobs that haven't been fetched yet, for a given user, a given group id. (lebrun, minerva). All of them. Not just one or two.
(ii) Fetch them by one by one, in a working directory (set by the user)
(iii) For each file type, it would execute a user-written python function that greps for specific message in the log (or .out, or .err), return a boolean
to the  jobsub_fetchLogAndCheck, depending on whether or not the job is accepted.
(iv) Tally the number of accepted jobs.

The FIFE crack team would write (i), (ii), (iv), and the user would write (iii) . In my case, I would simply look at a few Geant 4 warning or error messages,
and the final completion message of g4numi (or g4lbne).

Note that we copy the detailed output (a root file) elsewhere, but a generic python routine, again written by the user, would "handle output" 



On 11/12/2014 02:27 PM, Parag Mhashilkar wrote:
Hi Paul,

This is useful feedback. We understand that using fetch log is not always convenient and we want to simplify things for the end user in future releases.

There is a challenge … Supporting regex in job ids can be expensive and can cause extreme loads on the server. Just for the example below, there is a very good chance for human error and possibility that --jobid=842*.0\ could potentially match to 100 - 1000s of jobs and you may only need a few of them.

If the regexs like these are supported, once the command is issued, server will identify all the available sandboxes matching the pattern and start tarring+gziping them. Both these tasks are cpu intensive and we also want to avoid such situations if possible.

#2 Updated by Parag Mhashilkar over 5 years ago

  • Assignee set to Parag Mhashilkar
  • Target version set to v1.2

#3 Updated by Dennis Box about 4 years ago

  • Target version changed from v1.2 to v1.3

#4 Updated by Dennis Box 11 months ago

  • Target version changed from v1.3 to v1.3.2

#5 Updated by Dennis Box 11 months ago

  • Assignee changed from Parag Mhashilkar to Shreyas Bhat

#6 Updated by Shreyas Bhat 8 months ago

#7 Updated by Shreyas Bhat 8 months ago

Added a blocker - we don't want to work on this until the optparse-argparse code is in master. The functionality that controls the JID stuff is in a jobsubClient:JID_Callback, which has been introduced in #22164

#8 Updated by Dennis Box about 1 month ago

  • Target version deleted (v1.3.2)
  • Status changed from New to Rejected

This is desirable behavior but we have always been scared to do it because of potential side effects. We are moving to a new architecture where users have direct access to the condor_* commands, so this will not be needed at that point

Also available in: Atom PDF