Project

General

Profile

Feature #6697

cdf wrapper script behavior

Added by Dennis Box about 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
JobSub Tools
Target version:
Start date:
07/28/2014
Due date:
% Done:

100%

Estimated time:
Stakeholders:
Duration:

Description

Wilis answered this question via email, here is what cdf wrapper script needs to do from his perspective:

Hi Dennis:

I ran with an input tarball using jobsub_client v0_4_rc3
and they all worked.
-- Willis
Let me summarize what CDF needs that works.
1) Kerberos credentails on farm - I can fcp:scp/rcp back an
output file to fcdflnx6
2) SamWEB curl getNextFile with X509 credentials from
jobsub submission on SL6. I hit bug on curl SL6 where
chained certificates no longer work because NSS is
used. Robert Illingsworth needed a new cert on the SamWEB
server side, and on the jobsub farm node side, these
curl parameters:
-L --retry --cert ${X509_USER_PROXY}
--cacert ${X509_USER_PROXY}
--capath ${X509_CERT_DIR}
3) Side note: The CDF Kerberized dCache was a difficult
nut to crack with new Kerberos principal. The dCache
config needs to know that cdffana (49118.3200)
has to be mapped to me. After Dmitry Litvintsev added
this, I was able to get data from CDF dCache via
CDF CVMFS cdfsoft2 using Ray's latest SamWEB.
4) The input tarball is now avaiable on the jobsub
farm node.
CDF CafSubmit features for jobsub: My typical CafSubmit
is
MAILTO="--email=$EMAIL_LIST"
FARMDEST='--farm=cdfgrid --os=sl6'
INPUT_ACCESS="--dhaccess=SAM --dataset=$DSAMNAME"
CafSubmit
$FARMDEST \
--tarFile=$GTARBALL \
--outLocation=${SERVER1}:${OTGZDIR}/${SDSID}-\$.tgz \
--procType=medium --maxParallelSec=100 $INPUT_ACCESS \
$MAILTO \
--start=$BEGSEG --end=$ENDSEG \
$SCRIPT $STRIPMACRO \$ $INPUT_DATA $SERVER2 $OUTPDIR"
The last line is the job script and its input parameters. I use
$SERVER2 $OUTPDIR for output scp's in the run script $SCRIPT,
but it can be hard coded in $SCRIPT.
The jobsub begin/end wrapper should do the following.
1) begin script: unpack the input tarball, and the execute
$SCRIPT. This script must be in the tarball, so jobsub
does not have to send it. The $ parameter should be
provided by the begin script -- this is the section number
of the job. The begin script also needs to prepare the
expected CafSubmit/CafExe CDF environment that is currently
on cdfgrid: pwd is the top level directory of the tarball,
and USER set to the job submitter -- not doing so will break
many user scripts, and whatever else is buried in cdfsoft2.
2) end script: tgz up all of what is in the top directory
(where input tarball is unpacked), and send it the
--outLocation. The $ parameter is the job section number.
If this is not specified, send the output to the job
submitter's CDF ICAF directory; use a file name that has
the fifebatch job number and the section number.
Although this is not for the begin/end script, the jobsub
server (head node) needs to deal with $MAILTO when it detects
all job sections have completed.

On Mon, 28 Jul 2014, Dennis Box wrote:

Hi WIllis,

The best way to see if you were just lucky is to submit it again a few times in slightly different ways, seeing if you can break it. I

We need to finish the cdf wrapper part of this project, the part that makes the *_wrap.sh behave as much like CafExe as possible.
Here is our project ticket on this https://cdcvs.fnal.gov/redmine/issues/5333, note the table at the bottom.

Should the wrapper automatically untar the input file? Should the CDF wrapper always be part of a DAG, like CAF jobs are? I need to start working on the --maxParallelSessions --beginSection and --endSections, they make sense as to me as DAG options but may be trickier as non-DAG jobs.

Dennis

On 7/28/14 12:49 PM, Willis Sakumoto wrote:

Hi Lynn and Dennis:

Lynn: thanks - I can setup jobsub_client v0_4_rc3.

Dennis: Good, it works - I got $INPUT_TAR_FILE -- but
was I lucky?
-- Willis

On Mon, 28 Jul 2014, Lynn Garren wrote:

This is now installed on /cdf/code, but it is not on cvmfs.

On 7/28/14 11:30 AM, Dennis Box wrote:

Hi Lynn,

Can you install v0_4_rc3 of jobsub_client in cdfsoft ?

Willis, I think this clears up the problems with --tar_file_name , can
you verify?

Thanks
Dennis

History

#1 Updated by Dennis Box about 5 years ago

  • Category set to JobSub Tools
  • Target version set to v1.0

#2 Updated by Dennis Box about 5 years ago

I should have put this email in the ticket last week, it further describes needed behavior for the cdf wrapper

Hi Dennis:
The '$' parameter is a token whose value is equal to
CAF_SECTION. The jobsub begin script which sets up the
CDF environment needs to check the parameters passed to
the user run script, and if it contains the '$' token,
CAF_SECTION is to be substitued for the token. The
jobsub end script when it tar'ing and gzipping the job
work area into a file for --outLocation, needs to
also substitute and '$' tokens in the output file name
with CAF_SECTION.
We need this feature.
-- Willis

also needs to substitute the '$'
token with CAF_SECTION on the output

On Fri, 1 Aug 2014, Dennis Box wrote:

I just put a new release of jobsub_tools in KITS that implements some of the items Willis was asking for in ticket 6697.

I assume the release of v0.4 will be going forward with jobsub_tools v1_3_1_3. After the release is complete, can this jobsub_tools be installed on fifebatch-dev for Willis to try?

From the release notes:

v1_3_1_4_rc1 ============================================================
- partial implementation of ticket 6697

All the input flags to CafSubmit now present for jobsub
when GROUP=cdf. Most of them are no-ops however.

Input tar file is unwound by the wrapper script and the
rest of the input arguments are executed in-line in the resulting
directory tree

CAF_SECTION, CAF_JOB_BEGIN_SECTION, CAF_JOB_END_SECTION,
CAF_JID all present in execution environment. CAF_JOB_BEGIN_SECTION
is always 1 CAF_JOB_END_SECTION is how many jobs were submitted. This
will be fixed later.

At end of job, tarball named with CAF_SECTION is created
and scp'ed out, --outLocation value used when present,
sent to fcdficaf2 if not used.

NB I am away all next week on vacation, but will be checking mail occasionally.

A question for Willis: I don't remember how that \$ notation for the running section works and wasn't able to dig it out of the source code. Is CAF_SECTION being set to the same value in the execution environment sufficient? I suspect not but one can hope...

Thanks
Dennis

#3 Updated by Dennis Box about 5 years ago

rc3 in KITS to fix typo in rc2 which made testing --start --end and --sections unusable.

Here are release notes for rc2 and rc3:
v1_3_1_4_rc2 =========================================================
- further implementation of ticket 6697

--maxParallelSec generates the same -maxidle and -maxjobs settings for
condor_submit_dag as CafSubmit does
--start --end, and --sections generate the same section numbers as happens
when used with CafSubmit
--outLocation works the same as it does for CafSubmit, except the only
copyback option is scp. If no --outLocation is specified the job tries
to go back to fcdficaf2.
-- the $ variable as an input parameter gets changed to $CAF_SECTION
every place it gets used in the CDF wrapper script . If $ appears in
the --outLocation, it gets substituted as ${CAF_SECTION}-${CAF_JID},
which appears to be the same behavior as CafSubmit

v1_3_1_4_rc3 ============================================================
--fixed typo which prevents --start --end --sections from working
obviously should be part of test suite

#4 Updated by Dennis Box about 5 years ago

  • Status changed from New to Assigned
  • % Done changed from 0 to 70

#5 Updated by Dennis Box about 5 years ago

another requirement: the wrapper script needs to periodically wake up and do a kinit -r to keep credentials cache alive

#6 Updated by Dennis Box about 5 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 70 to 100

Closing this ticket and merging code back into main, I am worried that the more changes that accumulate the harder it will be to merge in the end. Willis' test job () seemed to do everything advertised in rc3. If there are unresolved issues I want to open single issue tickets for them. Having the --email flag send back a summary email like the CAF does would be nice, for instance.

Dennis

#7 Updated by Parag Mhashilkar about 5 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF