current krb5cc lenght is too short
This line in auth.py:
self.renewableLifetimeHours = 72
creates the kerberos credentials cache with a renewable lifetime of 3 days. These credential caches get renewed I think daily so you're only guaranteed that the kerberos credential that gets sent with the job is good for at least 2 days. Please set them to 168h or 7d so we don't have to worry about the kerberos credentials cache not being renewable for the full lenght of the job.
#3 Updated by Dennis Box about 6 years ago
- Assignee changed from Dennis Box to Parag Mhashilkar
#4 Updated by Parag Mhashilkar about 6 years ago
- Assignee changed from Parag Mhashilkar to Dennis Box
Changes are ok. One minor comment in following code
def copy_user_krb5_caches(): jobsubConfig = jobsub.JobsubConfig() krb5cc_dir = jobsubConfig.krb5ccDir cmd = spawn.find_executable('condor_q') if not cmd: raise Exception('Unable to find condor_q in the PATH') cmd += """ -format '%s\n' 'ifthenelse (EncrypInputFiles=?=UNDEFINED, string(EncryptInputFiles),string(""))' """ already_processed=[''] cmd_out, cmd_err = subprocessSupport.iexe_cmd(cmd) if cmd_err: logger.log("%s"%sys.exc_info()) raise Exception("command %s returned %s"%(cmd,cmd_err)) lines = cmd_out.split("\n") for job_krb5_cache in lines: if job_krb5_cache not in already_processed: already_processed.append(job_krb5_cache) cache_basename = os.path.basename(job_krb5_cache) base_parts = cache_basename.split('_') username = base_parts[-1] system_cache_fname = os.path.join(krb5cc_dir, cache_basename) try: logger.log('copying %s to %s'%(system_cache_fname,job_krb5_cache)) jobsub.copy_file_as_user(system_cache_fname, job_krb5_cache, username) except: logger.log("Error processing %s" % job_krb5_cache) logger.log("%s"%sys.exc_info())
If out convert the array to set, you can reduce the loop time drastically as well as memory used
lines = set(cmd_out.split("\n") [...] # don't need following logic at all already_processed=[''] if job_krb5_cache not in already_processed:
#7 Updated by Dennis Box about 6 years ago
Note on how this was tested.
Submit cdf DAG jobs, with a klist -a somewhere in them.
Put them on hold.
Wait a day or 2
Look at krbrefresh logs, verify that new krb5cc caches have been put in user scratch areas
Look at klist -a output on returned jobs.
Willis ran a bunch of independent tests that were variations on the above.