Project

General

Profile

Support #20749

library problem sometimes when other scripts are using the GWMS HTCondor

Added by Marco Mambelli 8 months ago. Updated 5 months ago.

Status:
New
Priority:
Normal
Category:
-
Target version:
Start date:
09/05/2018
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

GlideinWMS ships reduced condor tarballs for its own use.
These should be sent for the platform of the entry.
All the required libraries are sent as well (all dependencies listed by ldd)

There is a library problem sometimes when using the HTCondor feature to transfer sandbox using squid proxy (condor curl plugin), the solution depends on HTCondor.

Background:
- Condor tarball can be built using HTCondor tarball releases or the system condor (which can be also RPM). When built from a tarball, all executables include both RPATH and RUNPATH, this means that some non-standard directories are searched also w/o setting LD_LIBRARY_PATH

# readelf -d sbin/condor_advertise | grep path
 0x0000000f (RPATH)                      Library rpath: [$ORIGIN/../lib:/lib:/usr/lib:$ORIGIN/../lib/condor:/usr/lib/condor]
 0x0000001d (RUNPATH)                    Library runpath: [$ORIGIN/../lib:/lib:/usr/lib:$ORIGIN/../lib/condor:/usr/lib/condor]

- Note that RPATH and RUNPATH are not there in the RPM version of the condor executables. Maybe building from the system condor could be discouraged
- GWMS is packing in a libexec directory all the required libraries
- GWMS is setting LD_LIBRARY_PATH in condor_startup.sh before starting/using condor. This way the shipped libraries and not the system ones are used

Changing LD_LIBRARY_PATH in glidein_startup.sh instead of condor_startup would allow validation scripts to detect it but this is not the best solution.
Moving the libraries in the ../lib directory insteas of ../libexec will achieve the sem result.

Use of GWMS shipped condor should anyway not be encouraged because problems may arise in a Singularity environment.
Condor is shipped for the host OS/platform (the entry configuration matches the system). When using Singularity, the job may run in a different platform (e.g. RHEL6 instead of native RHEL7) and the condor binaries and libraries may not work correctly.
So VOs and users can use GWMS but should pay attention that they are for the right platform.

TODO in this ticket:
- change the libraries directory to lib (so that LD_LIBRARY_PATH is no more needed for tarball distributed binaries)
- investigate more why the library mismatch happened. Was Singularity used?
- add in the GWMS tarball a description file (listing the OS/platform) and maybe a setup file (to source for PATH and LD_LIBRARY_PATH if needed)
- print a warning in the log if there is a platform mismatch
- set LD_LIBRARY_FILE only for binaries that do not include RUNPATH (e.g. from RPM)


Related issues

Related to glideinWMS - Milestone #19515: Roadmap for Singularity supportNew2018-03-27

History

#1 Updated by Marco Mambelli 5 months ago

  • Target version changed from v3_5 to v3_5_1

#2 Updated by Lorena Lobato Pardavila 5 months ago

  • Assignee set to Lorena Lobato Pardavila

#3 Updated by Marco Mambelli 2 months ago



Also available in: Atom PDF