Project

General

Profile

Feature #22347

When we create tarfiles, compress them in place rather than making a tempfile and compressing it

Added by Shreyas Bhat 5 months ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
JobSub Client
Target version:
Start date:
04/10/2019
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

Liang Li from gm2 noted that in the gpvm $TMPDIRs (usually /tmp), they're limited to 2 GB of space. Thus, if they try to create larger tarballs using the jobsub_client tardir:// feature, they can fill up tmp before the tarball is actually created.

This is because in client/jobsubClient.py, we create a temp file using tempfile.mktemp(), put the contents of the intended tarball in there, compress the tarball using gzip -n, and then move that file into place. Can we skip making the tempfile, and instead write directly to the final tarfile compressed using something like:

tar = tarfile.open(filename, 'w:gz')

Here is the original message from Liang:

        I do have one comment, tardir:// option seems to first tar the
designated directory and then compress it, all of which is done at $TMPDIR
(normally "/tmp"), this is actually a problem for gm2 VM --- for some
reason, /tmp space is limited to merely 2GB for all VMs (Adam and I are
starting another discussion about that). This has caused problems when /tmp
is filled up. Apparently /tmp can be easily filled up when tardir:// option
is used (as I explained above, a tar ball is *first* created and then
*compressed*). Of course, one can simply relocate $TMPDIR to circumvent
that. But I just thought that it might be more convenient (and probably more
efficient) for tardir:// option to act like a "tar cfz" command (which
creates tar ball and compresses it at the same time).

History

#1 Updated by Shreyas Bhat 5 months ago

I wonder if we do this because we need to use gzip -n to ensure we ignore timestamps...

#2 Updated by Shreyas Bhat 5 months ago

Dennis and I discussed how this could be done. We decided to create a new flag called "--tar_output_dir" that users could use in conjunction with the tardir:// URI (to either the -f or the --tar_file_name flags) that would allow users to specify where they wanted the tarballs to be created and then compressed. This option would be passed to the os.temp_file call in the jobsubClient.py code where we actually create the tarball (create_tar or something like that?)

Poll uboone and gm2 to see if this name/behavior works for them.

#3 Updated by Shreyas Bhat 5 months ago

  • Assignee changed from Parag Mhashilkar to Shreyas Bhat

#4 Updated by Dennis Box 4 months ago

  • Target version set to v1.3.1


Also available in: Atom PDF