Project

General

Profile

Feature #8065

current krb5cc lenght is too short

Added by Joe Boyd about 6 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
JobSub Server RPM
Target version:
Start date:
03/11/2015
Due date:
% Done:

0%

Estimated time:
Stakeholders:

CDF

Duration:

Description

This line in auth.py:

self.renewableLifetimeHours = 72

creates the kerberos credentials cache with a renewable lifetime of 3 days. These credential caches get renewed I think daily so you're only guaranteed that the kerberos credential that gets sent with the job is good for at least 2 days. Please set them to 168h or 7d so we don't have to worry about the kerberos credentials cache not being renewable for the full lenght of the job.

History

#1 Updated by Parag Mhashilkar about 6 years ago

  • Status changed from New to Feedback
  • Assignee set to Dennis Box
  • Target version set to v1.1.1

Changes are in branch 8065. Please review and test it.

#2 Updated by Parag Mhashilkar about 6 years ago

  • Stakeholders updated (diff)

#3 Updated by Dennis Box about 6 years ago

  • Assignee changed from Dennis Box to Parag Mhashilkar

Please see branch #8065-8094. I think this actually resolves CDF's issues, but its a big enough change that it warrants review by other eyes than mine.

This branch has a trival fix for #8094 as well.

Thanks
Dennis

#4 Updated by Parag Mhashilkar about 6 years ago

  • Assignee changed from Parag Mhashilkar to Dennis Box

Changes are ok. One minor comment in following code

def copy_user_krb5_caches():
    jobsubConfig = jobsub.JobsubConfig()
    krb5cc_dir = jobsubConfig.krb5ccDir
    cmd = spawn.find_executable('condor_q')
    if not cmd:
        raise Exception('Unable to find condor_q in the PATH')
    cmd += """ -format '%s\n' 'ifthenelse (EncrypInputFiles=?=UNDEFINED, string(EncryptInputFiles),string(""))' """ 
    already_processed=['']
    cmd_out, cmd_err = subprocessSupport.iexe_cmd(cmd)
    if cmd_err:
        logger.log("%s"%sys.exc_info()[1])
        raise Exception("command %s returned %s"%(cmd,cmd_err))
    lines = cmd_out.split("\n")
    for job_krb5_cache in lines:
        if job_krb5_cache not in already_processed:
            already_processed.append(job_krb5_cache)
            cache_basename = os.path.basename(job_krb5_cache)
            base_parts = cache_basename.split('_')
            username = base_parts[-1]
            system_cache_fname = os.path.join(krb5cc_dir, cache_basename)
            try:
                logger.log('copying %s to %s'%(system_cache_fname,job_krb5_cache))
                jobsub.copy_file_as_user(system_cache_fname, job_krb5_cache, username)
            except:
                logger.log("Error processing %s" % job_krb5_cache)
                logger.log("%s"%sys.exc_info()[1])

If out convert the array to set, you can reduce the loop time drastically as well as memory used

lines = set(cmd_out.split("\n")

[...]

# don't need following logic at all
already_processed=['']
if job_krb5_cache not in already_processed:

#5 Updated by Dennis Box about 6 years ago

merged 8065-8094 branch into master, this should take care of Willis' problem but not marking it resolved just yet, I want to test some more and have Willis et all test more as well.

#6 Updated by Dennis Box about 6 years ago

  • Status changed from Feedback to Resolved

#7 Updated by Dennis Box about 6 years ago

Note on how this was tested.

Submit cdf DAG jobs, with a klist -a somewhere in them.
Put them on hold.
Wait a day or 2
Look at krbrefresh logs, verify that new krb5cc caches have been put in user scratch areas
Release jobs
Look at klist -a output on returned jobs.

Willis ran a bunch of independent tests that were variations on the above.

#8 Updated by Parag Mhashilkar almost 6 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF