Project

General

Profile

Support #25434

execute/dir problem at UConn

Added by Marco Mambelli 3 months ago. Updated 3 months ago.

Status:
Feedback
Priority:
Normal
Category:
-
Target version:
-
Start date:
01/25/2021
Due date:
% Done:

0%

Estimated time:
Stakeholders:

OSG

Duration:

Description

Christina Koch (OSG) reported the following problem from user Baris:

WARNING  glidein_config not defined () in singularity_lib.sh. Some functions like advertise and error_gen will be limited.
WARNING  String '/execute/dir_' in X509_USER_PROXY (/var/lib/condor/execute/dir_61396/glide_bisArb/ticket/myproxy), the conversion to run in Singularity may be incorrect
WARNING  String '/execute/dir_' in X509_USER_CERT (/var/lib/condor/execute/dir_61396/glide_bisArb/hostcert.pem), the conversion to run in Singularity may be incorrect
WARNING  String '/execute/dir_' in X509_USER_KEY (/var/lib/condor/execute/dir_61396/glide_bisArb/hostkey.pem), the conversion to run in Singularity may be incorrect
WARNING  String '/execute/dir_' in _CONDOR_EXECUTE (/var/lib/condor/execute/dir_61396/glide_bisArb/execute), the conversion to run in Singularity may be incorrect

This was fixed in https://cdcvs.fnal.gov/redmine/issues/25038

Christina reported that it ran on cn443.storrs.hpc.uconn.edu.

History

#1 Updated by Marco Mambelli 3 months ago

Edita, Factory Ops, did not find that CE but found for UConn cn410.storrs.hpc.uconn.edu CE, entry GLUEX_US_UConn-HPC_osgce
Both Edita and Marco Mascheroni provided logs:
job.4091216.0.out
job.4091216.0.err
job.4094671.0.out
job.4094671.0.err
job.4108625.0.out
job.4108625.0.err

Investigating them showed no path error.
The start dir reported is different:

Started in /gpfs/gpfs1/condor/spool/2913/0/cluster72913.proc0.subproc0/home_bl_cn410.storrs.hpc.uconn.edu_9619_cn410.storrs.hpc.uconn.edu#72928.0#1611290308
Running in /local/slurm-job-4185209/glide_95EHl2

The error about LD_PRELOAD was there.

INFO  GWMS Singularity wrapper: PATH is set to /local/slurm-job-4185209/glide_95EHl2//local/slurm-job-4185209/glide_95EHl2/.gwms.d/bin::/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin outside Singularity. This will not be propagated to inside the container instance.
ERROR: ld.so: object 'libcr_run.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libcr_run.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libcr_run.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libcr_run.so' from LD_PRELOAD cannot be preloaded: ignored.

This is solved in [#25428]. As mentioned there, the workaround for this is:
set the attr GLIDEIN_CONTAINER_ENV, string, to "clear" either in the Factory or in the Frontend global/group (https://glideinwms.fnal.gov/doc.v3_7_2/factory/custom_vars.html#singularity_vars).
This will clear all not needed environment variables. It may be more than you like but worth testing.

#2 Updated by Marco Mambelli 3 months ago

The Factory has been patched to solve the LD_PRELOAD issue.

The /execute/dir_ issue may be at a different site (in this one the path is different).
Should follow up w/ OSG (Christina/Mats) to understand the site.

#3 Updated by Marco Mambelli 3 months ago

The site seems actually ISI. From Christina:

46473?CCBID=192.170.227.251:9797%3faddrs%3d192.170.227.251-9797+[2605-9a00-10-400d-7686-7aff-fedd-d118]-9797%26alias%3dflock.opensciencegrid.org#1220195&PrivNet=mimir.isi.edu&addrs=128.9.44.59-46473&alias=mimir.isi.edu&noUDP>
44577?CCBID=192.170.227.251:9850%3faddrs%3d192.170.227.251-9850+[2605-9a00-10-400d-7686-7aff-fedd-d118]-9850%26alias%3dflock.opensciencegrid.org#1202649&PrivNet=mimir.isi.edu&addrs=128.9.44.59-44577&alias=mimir.isi.edu&noUDP>
46473?CCBID=192.170.227.251:9797%3faddrs%3d192.170.227.251-9797+[2605-9a00-10-400d-7686-7aff-fedd-d118]-9797%26alias%3dflock.opensciencegrid.org#1220195&PrivNet=mimir.isi.edu&addrs=128.9.44.59-46473&alias=mimir.isi.edu&noUDP>
46473?CCBID=192.170.227.251:9797%3faddrs%3d192.170.227.251-9797+[2605-9a00-10-400d-7686-7aff-fedd-d118]-9797%26alias%3dflock.opensciencegrid.org#1220195&PrivNet=mimir.isi.edu&addrs=128.9.44.59-46473&alias=mimir.isi.edu&noUDP>
46473?CCBID=192.170.227.251:9797%3faddrs%3d192.170.227.251-9797+[2605-9a00-10-400d-7686-7aff-fedd-d118]-9797%26alias%3dflock.opensciencegrid.org#1220195&PrivNet=mimir.isi.edu&addrs=128.9.44.59-46473&alias=mimir.isi.edu&noUDP>
46473?CCBID=192.170.227.251:9797%3faddrs%3d192.170.227.251-9797+[2605-9a00-10-400d-7686-7aff-fedd-d118]-9797%26alias%3dflock.opensciencegrid.org#1220195&PrivNet=mimir.isi.edu&addrs=128.9.44.59-46473&alias=mimir.isi.edu&noUDP>
46473?CCBID=192.170.227.251:9797%3faddrs%3d192.170.227.251-9797+[2605-9a00-10-400d-7686-7aff-fedd-d118]-9797%26alias%3dflock.opensciencegrid.org#1220195&PrivNet=mimir.isi.edu&addrs=128.9.44.59-46473&alias=mimir.isi.edu&noUDP>
46473?CCBID=192.170.227.251:9797%3faddrs%3d192.170.227.251-9797+[2605-9a00-10-400d-7686-7aff-fedd-d118]-9797%26alias%3dflock.opensciencegrid.org#1220195&PrivNet=mimir.isi.edu&addrs=128.9.44.59-46473&alias=mimir.isi.edu&noUDP>
46473?CCBID=192.170.227.251:9797%3faddrs%3d192.170.227.251-9797+[2605-9a00-10-400d-7686-7aff-fedd-d118]-9797%26alias%3dflock.opensciencegrid.org#1220195&PrivNet=mimir.isi.edu&addrs=128.9.44.59-46473&alias=mimir.isi.edu&noUDP>

#4 Updated by Marco Mascheroni 3 months ago

That /execute/dir_ issue seems to be widespread:

[0653] gfactory@gfactory-2 ~$ grep "the conversion to run in Singularity may be incorrect" /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/ -r -l | sed 's/.\{17\}$//' | sort | uniq -c
    152 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_CMSHTPC_T1_IT_CNAF_condor_ce01/
    195 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_CMSHTPC_T1_IT_CNAF_condor_ce02/
    199 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_CMSHTPC_T1_IT_CNAF_condor_ce03/
     27 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_CMSHTPC_T2_BR_SPRACE/
    104 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_CMSHTPC_T3_US_NotreDame_deepthought/
      3 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_CMS_T2_US_Nebraska_Red_gw1_whole_op/
      7 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_CMS_T2_US_Nebraska_Red_gw2_whole_op/
      1 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_CMS_T2_US_Nebraska_Red_whole_op/
     88 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_CMS_T3_US_PuertoRico_UPRM/
      7 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_CA_CYBERA_EDMONTON/
   2903 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_GPGrid_ce03_mcore_op/
   2780 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_GPGrid_ce04_mcore_op/
    575 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_OU_OCHEP_SWT2_tier2-01/
   3492 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_T1_US_FNAL_condce_opp1/
    248 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_T2_BR_SPRACE/
     18 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_T2_BR_UERJ_ce2/
     74 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_T2_US_MIT_ce03/
     87 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_T2_US_Nebraska_Red_gw1_whole_op/
     42 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_T2_US_Nebraska_Red_gw2_whole_op/
     69 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_T2_US_Nebraska_Red_whole_op/
    477 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_T3_US_Rutgers_ruhex/
    377 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_T3_US_UMiss_umiss001/
    403 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_US_BNL_gk01/
    308 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_US_BNL_sp01/
    100 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_US_ISI_osg/
   6058 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_US_Michigan_gate02/
   3587 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_US_MWT2_iut2_condce/
    198 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_US_MWT2_iut2_condce_mcore/
   3270 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_US_MWT2_mwt2_condce/
    155 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_US_MWT2_mwt2_condce_mcore/
   3438 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_US_MWT2_uct2_condce/
    188 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_US_MWT2_uct2_condce_mcore/
      3 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_COVID19_US_Wisconsin_osg01_rhel7/
  16500 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Engage_US_MWT2_iut2_condce/
   3741 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Engage_US_MWT2_iut2_condce_mcore/
  16358 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Engage_US_MWT2_uct2_condce/
   3837 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Engage_US_MWT2_uct2_condce_mcore/
    111 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Glow_US_Syracuse2_condor_gpu/
     30 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Glow_US_Syracuse2_condor_gpu/j
     50 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Glow_US_Syracuse3_condor_gpu/
     15 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Glow_US_Syracuse3_condor_gpu/j
    117 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Glow_US_Syracuse4_condor_gpu/
   3410 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Glow_US_Syracuse_condor-ce2/
   8543 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Glow_US_Syracuse_condor-ce2/j
   3065 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Glow_US_Syracuse_condor-ce3/
   8665 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Glow_US_Syracuse_condor-ce3/j
   3021 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Glow_US_Syracuse_condor-ce4/
   8115 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Glow_US_Syracuse_condor-ce4/j
    170 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Glow_US_UCSD_xcache_gpu/
      2 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_Glow_US_UCSD_xcache_gpu/j
    438 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_HCC_US_BNL_gk01/
    143 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_HCC_US_BNL_gk01/j
    587 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_HCC_US_BNL_gk02/
    146 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_HCC_US_Wisconsin_osg01_rhel7/
    724 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_IceCube_US_UCSD_xcache/
     65 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_OSG_CA_CancerComputer_minne/
      4 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_OSG_CA_CYBERA_EDMONTON/
    702 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_OSG_OU_OCHEP_SWT2_tier2-01/
    436 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_OSG_SLATE_US_NMSU_AGGIE_GRID/
   1286 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_OSG_US_ASU-DELL_M420/
     67 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_OSG_US_ISI_osg/
  14429 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_OSG_US_MWT2_mwt2_condce/
    953 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_OSG_US_MWT2_mwt2_condce_mcore/
   1921 /var/log/gwms-factory/client/user_feosgflock/glidein_gfactory_instance/entry_OSG_US_UConn_gluskap_op/

#5 Updated by Marco Mambelli 3 months ago

  • Assignee changed from Marco Mambelli to Marco Mascheroni
  • Status changed from Work in progress to Feedback

Changes are in v37/25434
- "execute/dir_" warning fixed removing unneeded variables in the singularity_wrapper.sh
- fixed also a bug when using default value in env variables inside singularity
- fixed also a bug in the config validation of executable files

To patch a Factory with v3.7.2 installed it is sufficient to copy in /var/lib/gwms-factory/web-base/singularity_wrapper.sh the version in GitHub, branch v37/25434, of creation/web_base/singularity_wrapper.sh:
https://raw.githubusercontent.com/glideinWMS/glideinwms/v37/25434/creation/web_base/singularity_wrapper.sh

This fixes the warnings. The other files can wait for the release.

Also available in: Atom PDF