Project

General

Profile

Bug #8094

Temporary x509cc files not getting cleaned up in /var/lib/jobsub/tmp

Added by Dave Dykstra about 6 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
03/16/2015
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:

Operations

Duration:

Description

There are a ton of old zero-length files in /var/lib/jobsub/tmp on fifebatch1 & fifebatch2 that are of the form x509cc_<user>_{Analysis|Production}_<randomstring>. Please make sure these get deleted when they stop being used.

Associated revisions

Revision 9e88b24b (diff)
Added by Dennis Box about 6 years ago

Reviewing #8065 I realized the lifetime of the original krb5 cache is not the entire story.
Behavior of klist in is_valid_cache() changed after krb5 cache owned by job submitter but klist done
by different user. Klist always fails in this case but klist -s just shows return code so we didn't
catch it. Changed behavior to always kinit if proxy needs refreshing, and importantly, copy newly
generated krb5 cache to users job submission directory where jdf expects it.

Also, fix for #8094 which was trivial, is included as its high priority for fixing fifebatch2's problems.

History

#1 Updated by Parag Mhashilkar about 6 years ago

  • Assignee set to Dennis Box
  • Target version set to v1.1.2

#2 Updated by Parag Mhashilkar about 6 years ago

  • Stakeholders updated (diff)

#3 Updated by Dave Dykstra about 6 years ago

By the way, there were so many files that we were getting ENOSPACE errors even though there was still space; there were just too many files (>500k).

#4 Updated by Dave Dykstra about 6 years ago

In addition to making sure these kx509cc files are removed as soon as they're no longer needed, can you also include a cron to clean up any remaining old files in the tmp directory, in case of crashes and stuff?

#5 Updated by Parag Mhashilkar about 6 years ago

  • Target version changed from v1.1.2 to v1.1.1

#6 Updated by Dennis Box about 6 years ago

  • Status changed from New to Feedback

fix is in branch 8065-8094, auth.py starting at line 407:

#7 Updated by Dennis Box about 6 years ago

  • Status changed from Feedback to Resolved

#8 Updated by Dennis Box about 6 years ago

Note on how this was tested.

Verified that regression testing on v1.1.0 server or flood testing ala #8138 leaves these files.

Verified that after the fix these files were no longer created exercising server in the same way.

I do not know how to write a meaningful automated test for this.

#9 Updated by Parag Mhashilkar almost 6 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF