Project

General

Profile

Bug #20871

Sinful strings in the collector cannot have spaces

Added by Marco Mascheroni over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
Start date:
09/18/2018
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

I was trying the new shared port feature and according to the documentation one of the option is to use "sinful string can be used for the sock range case (sock=collector$RANDOM_INTEGER(ID1, ID2)"

Notide the space in the documentation. I went ahead and set this in my frontend config:

<collector DN="/DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=fermicloud306.fnal.gov" group="default" node="fermicloud306.fnal.gov:9618?sock=collector$RANDOM_INTEGER(1, 40)" secondary="True"/>

However after doing that my pilots were only lasting 20 seconds or so. Checking the logs I noticed that the collector string was cut after the space:

[mmascher@fermicloud328 ~]$ grep GLIDEIN_Collector /var/log/gwms-factory/client/user_frontend/glidein_gfactory_instance/entry_ITB_FC_CE3/job.17849.1.err
GLIDEIN_Collector fermicloud306.fnal.gov:9618?sock=collector$RANDOM_INTEGER(1,

I think we need:

1) Remove the space from the documentation
2) Make reconfig fail in case of spaces
3) Understand why the string is cut after the space and fix it

History

#1 Updated by Marco Mascheroni over 1 year ago

  • Target version set to v3_4_1

#2 Updated by Lorena Lobato Pardavila over 1 year ago

  • Status changed from New to Resolved

Thanks for the information Marco!

Workflow:
1.First what we have to do is to test in condor if the space is accepted (either documentation and the condor_config)
2.If it not accepted, first we have to correct the documentation and second to catch the error in the reconfiguration as requested
3.If it’s accepted (if it works in condor), we have to understand why it’s separated and why is not possible to keep the space. Afterwards, once it’s understood, we have to check where is the trouble being caused.

Solution:
Whave reproduced the problem and also checked that space is accepted in condor for RANDOM_INTEGER function. Thus, we've corrected this in our code (shell scripts).

We saw "awk '{print $2}'” seems to stop at the space. A better solution is AWK: 'awk '{$1=""; print $0}' which is being changed for ccb_host, factory_collector_host and collector_host.

Tested and merged into v34/7341_2.

#3 Updated by Marco Mambelli over 1 year ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF