Project

General

Profile

Bug #25553

Tarball upload method defaults to PNFS rather than CVMFS

Added by Shreyas Bhat about 2 months ago. Updated about 2 months ago.

Status:
Feedback
Priority:
High
Assignee:
Category:
JobSub Client
Target version:
Start date:
02/24/2021
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

There is a bug in the new release candidate of jobsub (1.3.3_rc1) that has to do with processing the default value for the tarball upload method for an experiment. When you don't specify anything on the command line, PNFS (resilient dCache) is chosen rather than CVMFS (RCDS).

Steps to reproduce

If one tries to run the following jobsub_submit command:

jobsub_submit -G nova --debug --tar_file_name dropbox:///nashome/s/sbhat/nikolay_test/TestDir2.tar --jobsub-server=jobsubdevgpvm01.fnal.gov file:///nashome/s/sbhat/nikolay_test/cvmfs_untar_2.sh

we would expect that the client would choose CVMFS as the upload method for the tarball TestDir2.tar, similar to the --use-cvmfs-dropbox override. Instead, we see that PNFS is chosen (see work notes for output of command).

Root cause

Assuming we're working off of tag v1.3.3.rc1 (hash 3482119) for line numbers. When we submit a job that needs to upload a tarball, client/jobsubClient.py:JobsubClient.dropboxMethod gets called to either send a request to the jobsub server for the dropbox upload method or use an override given on the command line. If no override is given, jobsubClient.requestValue is called on line 1415.

In jobsubClient.requestValue, when we retrieve information from a response from the jobsub server, we simply get the value of the 'out' key from the response text (line 940). In the case that the jobsub server has no dropbox method configured, a response with status code 404 gets returned, and the response text has no 'out' key (only 'err'). As a result, doc.get('out') returns None, which we return to the caller on line 945.

JobsubClient.dropboxMethod, gets the return value on line 1415. In the current code, the None value is acceptable, and is returned on line 1416 to the caller on line 314. A few lines later, on line 318, we simply check to see if the returned dropbox method is "cvmfs", and if not, we go ahead and assume that we need to upload via IFDH. Because the dropbox method is None, the ifdh upload proceeds.

Solution

The solution appears to be simple: we need to ensure that if the jobsub server returns an error in response to the request for a dropbox method, that the resulting None for the returned dropbox method is discarded and the default ('cvmfs') is returned.


Subtasks

Review Request #25554: Please review commit 0d874538 (Branch 25553)NewDennis Box

History

#1 Updated by Shreyas Bhat about 2 months ago

Test command to reproduce the issue (I interrupted this command when I saw that PNFS was being chosen):

-bash-4.2$ jobsub_submit  -G nova  --debug    --expected-lifetime='short' --tar_file_name dropbox:///nashome/s/sbhat/nikolay_test/TestDir2.tar --jobsub-server=jobsubdev
gpvm01.fnal.gov file:///nashome/s/sbhat/nikolay_test/cvmfs_untar_2.sh
SERVER_ARGS:  ['--expected-lifetime=short', '--tar_file_name', 'dropbox:///nashome/s/sbhat/nikolay_test/TestDir2.tar', 'file:///nashome/s/sbhat/nikolay_test/cvmfs_untar_2.sh']
Using CA_DIR: /etc/grid-security/certificates
ACTION URL     : https://jobsubdevgpvm01.fnal.gov:8443/jobsub/acctgroups/nova/authmethods/

CREDENTIALS    : {u'cert': u'/tmp/x509up_u10610', u'proxy': u'/tmp/x509up_u10610', u'key': u'/tmp/x509up_u10610'}

/usr/bin/cigetcert -s jobsubdevgpvm01.fnal.gov -n -o /tmp/x509up_u10610
Using CA_DIR: /etc/grid-security/certificates
stdout: Checking if /tmp/x509up_u10610 can be reused ..... yes

stderr:
ACTION URL     : https://jobsubdevgpvm01.fnal.gov:8443/jobsub/scheddload/nova/

CREDENTIALS    : {u'cert': u'/tmp/x509up_u10610', u'proxy': u'/tmp/x509up_u10610', u'key': u'/tmp/x509up_u10610'}

ACTION URL     : https://jobsubdevgpvm01.fnal.gov:8443/jobsub/acctgroups/nova/dropboxlocation/

CREDENTIALS    : {u'cert': u'/tmp/x509up_u10610', u'proxy': u'/tmp/x509up_u10610', u'key': u'/tmp/x509up_u10610'}

ACTION URL     : https://jobsubdevgpvm01.fnal.gov:8443/jobsub/acctgroups/nova/dropboxsize/

CREDENTIALS    : {u'cert': u'/tmp/x509up_u10610', u'proxy': u'/tmp/x509up_u10610', u'key': u'/tmp/x509up_u10610'}

ACTION URL     : https://jobsubdevgpvm01.fnal.gov:8443/jobsub/acctgroups/nova/dropboxmethod/

CREDENTIALS    : {u'cert': u'/tmp/x509up_u10610', u'proxy': u'/tmp/x509up_u10610', u'key': u'/tmp/x509up_u10610'}

calling ifdh_upload
srcpath=/nashome/s/sbhat/nikolay_test/TestDir2.tar destpath=/pnfs/nova/resilient/jobsub_stage/ff7ca6ee1c8fab13f2da15a979bc255da5778e04/TestDir2.tar
ifdh mkdir_p /pnfs/nova/resilient/jobsub_stage/ff7ca6ee1c8fab13f2da15a979bc255da5778e04 attempt:

#2 Updated by Shreyas Bhat about 2 months ago

Testing patch now.

PNFS test - SUCCESS

jobid:

On fermicloud042:

source /cvmfs/fermilab.opensciencegrid.org/products/common/etc/setups.sh
setup jobsub_client # To get the right python
./jobsub/client/jobsub_submit -G nova --debug --expected-lifetime='short' --tar_file_name dropbox:///home/sbhat/pnfs_test/TestDir.tar --use-pnfs-dropbox --jobsub-server=jobsubdevgpvm01.fnal.gov file:///home/sbhat/pnfs_test/test_untar.sh

CVMFS Test - SUCCESS

jobid:

On fermicloud042:

source /cvmfs/fermilab.opensciencegrid.org/products/common/etc/setups.sh
setup jobsub_client # To get the right python
./jobsub/client/jobsub_submit -G nova --debug --expected-lifetime='short' --tar_file_name dropbox:///home/sbhat/nikolay_test/TestDir2.tar  --jobsub-server=jobsubdevgpvm01.fnal.gov file:///home/sbhat/nikolay_test/cvmfs_untar_2.sh

#3 Updated by Shreyas Bhat about 2 months ago

  • Due date set to 02/24/2021

due to changes in a related task: #25554

#4 Updated by Shreyas Bhat about 2 months ago

  • Status changed from New to Feedback

Patch committed to branch 25553, and review request opened to Dennis.

Also available in: Atom PDF