Project

General

Profile

Bug #7407

cannot create directory `/fife' in .err files

Added by Christopher Backhouse over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
12/02/2014
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

My .err log files contain this:

mkdir: cannot create directory `/fife': Permission denied
chgrp: cannot access `/fife/local/scratch/uploads/nova/bckhouse': No such file or directory
chmod: cannot access `/fife/local/scratch/uploads/nova/bckhouse': No such file or directory
mkdir: cannot create directory `/fife': Permission denied
chgrp: cannot access `/fife/local/scratch/uploads/nova/bckhouse/2014-12-02_193649.260160_4611': No such file or directory
chmod: cannot access `/fife/local/scratch/uploads/nova/bckhouse/2014-12-02_193649.260160_4611': No such file or directory
mkdir: cannot create directory `/fife': Permission denied
chgrp: cannot access `/fife/local/scratch/uploads/nova/bckhouse': No such file or directory
chmod: cannot access `/fife/local/scratch/uploads/nova/bckhouse': No such file or directory
mkdir: cannot create directory `/fife': Permission denied
chgrp: cannot access `/fife/local/scratch/uploads/nova/bckhouse/2014-12-02_193649.260160_4611': No such file or directory
chmod: cannot access `/fife/local/scratch/uploads/nova/bckhouse/2014-12-02_193649.260160_4611': No such file or directory

Can this be avoided? Even if it's harmless it's likely to distract users trying to debug their own problems.

History

#1 Updated by Dennis Box over 4 years ago

Hi Chris,

This is another ticket I was not aware of until today, sorry.
I strongly suspect there is something in the submitted job that is trying to write to $CONDOR_TMP. If this job used to run on gpsn01 without complaint, its probably because $CONDOR_TMP is a bluearc directory cross mounted to gpsn01 and the worker nodes. It will cause an error on fifebatch, where $CONDOR_TMP is on a local disk that the worker nodes can't access.

Dennis

Dennis

#2 Updated by Parag Mhashilkar over 4 years ago

  • Assignee set to Dennis Box
  • Target version set to v1.1

#3 Updated by Christopher Backhouse over 4 years ago

The batch scripts don't mention $CONDOR_TMP at all. The submission scripts only do so as a hack to get my logs sorted into subdirectories when using jobsub_tools.

${_CONDOR_SCRATCH_DIR} is still the correct destination to copy input files to locally on the batch machines, right?

#4 Updated by Dennis Box over 4 years ago

Hi Chris,

Ugh, for some reason I am still not seeing updates on this ticket, even though its assigned to me? I added myself as watcher to see if that helps.

To answer your question, ${_CONDOR_SCRATCH_DIR} is still an OK place to copy your input, but it is not the working directory that your jobs land and execute in, as it usually was on gpsn01. The working directory that your jobs land in on the fifebatch systems is ${_CONDOR_JOB_IWD} . $_CONDOR_SCRATCH_DIR is set to ${_CONDOR_JOB_IWD}/no_xfer. At one time we were told that it was good practice to create and cd to this directory before executing the user job, now we are told it is not good practice and advised to stop.

This is admittedly confusing and we are trying to correct and clean up the documentation as fast as we can.

I believe I have found the source of the error output you reported originally in this ticket.

Line 11 of your user job, mixerjob.sh is:

source /grid/fermiapp/nova/novaart/novasvn/setup/setup_nova.sh -r S14-10-28 || exit

I tried logging onto novagpvm01 and doing this:

[dbox@novagpvm01 ~]$ export CONDOR_TMP=/fife/local/scratch/uploads/nova/bckhouse/2014-12-02_213218.216151_9051
[dbox@novagpvm01 ~]$ source /grid/fermiapp/nova/novaart/novasvn/setup/setup_nova.sh -r S14-10-28 || exit
mkdir: cannot create directory `/fife': Permission denied
chgrp: cannot access `/fife/local/scratch/uploads/nova/bckhouse': No such file or directory
chmod: cannot access `/fife/local/scratch/uploads/nova/bckhouse': No such file or directory
mkdir: cannot create directory `/fife': Permission denied
chgrp: cannot access `/fife/local/scratch/uploads/nova/bckhouse/2014-12-02_213218.216151_9051': No such file or directory
chmod: cannot access `/fife/local/scratch/uploads/nova/bckhouse/2014-12-02_213218.216151_9051': No such file or directory

Release: S14-10-28
Build: debug

PWD: /afs/fnal.gov/files/home/room2/dbox

It appears that setup_nova.sh is setting up jobsub_tools on the worker node, using a CONDOR_TMP defined on rexbatch1. Setting up jobsub_tools during a grid job on a worker node was never intended, it will probably break as soon as the job is submitted off site, and pollutes the user jobs environment with all kinds of accidental stuff as a side effect .

There is a pending ticket to make sure CONDOR_TMP is not exported to the worker nodes. Had this been done already you would not have noticed any error messages, but the problem would still be out there lurking.

A temporary workaround would be to undefine CONDOR_TMP before sourcing setup_nova.sh in mixerjob.sh The real fix is to change setup_nova.sh so it doesn't set up jobsub_tools.

Cheers
Dennis

#5 Updated by Dennis Box over 4 years ago

  • Status changed from New to Assigned

#6 Updated by Christopher Backhouse over 4 years ago

OK, I'll forward to the relevant people on our side to see if we can get setupnova changed.

Thanks - Chris

#7 Updated by Dennis Box over 4 years ago

  • Status changed from Assigned to Resolved

#8 Updated by Parag Mhashilkar over 4 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF