DN of nova production proxy not getting hashed properly causing intermittent submission failure
This is fallout from a bug in #14830 DN in nova production proxy changing according to who last used it
The chosen implementation of #14830 was to hash the submitting users DN on to the end of the x509_user_proxy file name to allow user tracking.
In the case of RITM0525157 has DN /DC=org/DC=cilogon/C=US/O=Fermi National Accelerator Laboratory/OU=People/CN=Raphael Schroeter/CN=UID:rschroet, hashing this results in a file name x509cc_novapro_Production_c5a69c9791d67f8ec95a4df63092735897cd63ae
The proxy used to submit a job typically has an extension on it such as
/DC=org/DC=cilogon/C=US/O=Fermi National Accelerator Laboratory/OU=People/CN=Raphael Schroeter/CN=UID:rschroet/CN=2715906601
/DC=org/DC=cilogon/C=US/O=Fermi National Accelerator Laboratory/OU=People/CN=Raphael Schroeter/CN=UID:rschroet/CN=1035304600
These /CN= extensions are supposed to be cleaned of prior to generating a hash of the DN to create the file name. Unfortunately, there are paths into the hashing function where this does not occur, generating an incorrect proxy file name. Later on in the execution path, when the job submission happens the DN cleaning may have occurred, leading to failure as it is looking for the wrong file name.
A one line change in the jobsub server code will fix the problem, I will make an emergency release.
A workaround until the emergency release is distributed would be to comment out line 119 of fifebatch's jobsub.ini
hash_nondefault_proxy = True
;hash_nondefault_proxy = True
Fixing this is a one line code change.