Project

General

Profile

Bug #24983

condor_chirp not working

Added by Marco Mambelli about 1 month ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
Start date:
09/17/2020
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

Jobs find the new condor_chirp but it fails for an incorrect PYTHONPATH

Here the email from Mascheroni:

Hi both,

to recap since Marco Mambelli was asking me privately. I found out the error and reported to Bruno on Wednesday at the meeting. We looked together at this during the meeting and Bruno acknowledged there was an issue with Singularity (IIUC he was able to replicate it in his dev instance).

I added few debug lines[1] to the gwms-python script, and yesterday I was able to get the output (sorry Bruno, it took few iterations since the lines had to be added to the factory script and not the frontend, and because for some reason the configuration change to only use CERN ITB was reverted by puppet..). Anyway, here is the issue in details:

https://mmascher.web.cern.ch/mmascher/job_out.4.0.txt
https://mmascher.web.cern.ch/mmascher/job_out.1.0.txt

As you can see both are failing with: 
No module named htchirp
Notice that on some sites this runs with the system python3 (not sure why sometimes we run with the system python, and sometimes with the CMSSW one). It is probably not important, but I thought I'd mention it.
Anyway, three things I noticed (looking at https://mmascher.web.cern.ch/mmascher/job_out.4.0.txt):

1) $my_path (/home/pilcms120/home_cr001_451874600/CREAM451874600/glide_7klZdP/gwms/bin) does not contains the "lib/python" dir that is then added to the PYTHONPATH.
2) The PYTHONPATH dir does not exist (ls: cannot access :/home/pilcms120/home_cr001_451874600/CREAM451874600/glide_7klZdP/gwms/bin/lib/python: No such file or directory) 
3) It seems the CRAB jobs is running inside the "execute/dir_818" directory:
== ENV: PWD=/home/pilcms120/home_cr001_451874600/CREAM451874600/glide_7klZdP/execute/dir_818
while we are adding to the PYTHONPATH a dir from two levels above, namely (/home/pilcms120/home_cr001_451874600/CREAM451874600/glide_7klZdP).

I hope this helps, if you guys point me to a commit I can apply it and try it directly in ITB!
Cheers,
Marco Mascheroni

[1]
else
    export PYTHONPATH="$PYTHONPATH:$my_path/lib/python" 
    echo "-------------------------" 
    echo my_path
    echo $my_path
    ls $my_path
    echo "-------------------------" 
    echo PYTHONPATH
    echo $PYTHONPATH
    ls $PYTHONPATH
    echo "-------------------------" 
    [ "$PYTHON" != python ] && export PYTHONPATH="$PYTHONPATH:$my_path/lib/$PYTHON" 
    exec $PYTHON "$@" 
    echo "$0: failed to execute $PYTHON" >&2
    exit 1
fi

History

#1 Updated by Marco Mambelli about 1 month ago

  • Assignee set to Marco Mambelli
  • Status changed from New to Resolved

Fixed in v36/24983 and merged.
Marco Mascheroni tested the fix.
The problem was that glidein_startup was copyin the binary and the position of the library was different.

#2 Updated by Marco Mambelli about 1 month ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF