Project

General

Profile

Bug #16810

Collector using shared port even if COLLECTOR_USES_SHARED_PORT=False

Added by Marco Mambelli almost 3 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Urgent
Category:
-
Target version:
Start date:
06/12/2017
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

The CollectorLog is showing some connection error:

06/12/17 13:14:12 CollectorAd  : Inserting ** "< frontend_service@fermicloud353.fnal.gov >" 
06/12/17 13:14:32 condor_read(): timeout reading 5 bytes from collector fermicloud353.fnal.gov:9618.
06/12/17 13:14:32 IO: Failed to read packet header
06/12/17 13:14:32 SECMAN: no classad from server, failing
06/12/17 13:14:32 ERROR: SECMAN:2007:Failed to end classad message.
06/12/17 13:14:32 Failed to send update to collector fermicloud353.fnal.gov:9618.
06/12/17 13:14:32 Unable to send UPDATE_COLLECTOR_AD to all configured collectors
06/12/17 13:14:32 condor_write(): Socket closed when trying to write 266 bytes to <131.225.155.80:21632>, fd is 9
06/12/17 13:14:32 Buf::write(): condor_write() failed
06/12/17 13:14:32 SECMAN: Error sending response classad to <131.225.155.80:21632>!
SessionDuration = "86400" 
AuthMethods = "FS,GSI" 
Command = 19
Authentication = "OPTIONAL" 
Subsystem = "COLLECTOR" 
Enact = "NO" 
ServerCommandSock = "<131.225.155.80:9615?addrs=131.225.155.80-9615+[--1]-9615&noUDP&sock=6877_62f2>" 
ParentUniqueID = "fermicloud353:6702:1497291250" 
Integrity = "OPTIONAL" 
RemoteVersion = "$CondorVersion: 8.6.3 Jun 01 2017 $" 
CryptoMethods = "3DES,BLOWFISH" 
NewSession = "YES" 
OutgoingNegotiation = "PREFERRED" 
Encryption = "OPTIONAL" 

Looking at the log it seems that the collector is started even if it is not the intention and the configuration contains: COLLECTOR_USES_SHARED_PORT=False

06/12/17 13:14:11 config Macros = 261, Sorted = 261, StringBytes = 21552, TablesBytes = 9508
06/12/17 13:14:11 CLASSAD_CACHING is ENABLED
06/12/17 13:14:11 Daemon Log is logging: D_ALWAYS D_ERROR
06/12/17 13:14:11 SharedPortEndpoint: waiting for connections to named socket 6877_62f2
06/12/17 13:14:11 DaemonCore: non-shared command socket at <131.225.155.80:9618>
06/12/17 13:14:11 Daemoncore: Listening at <0.0.0.0:9618> on TCP (ReliSock) and UDP (SafeSock).
06/12/17 13:14:11 DaemonCore: non-shared command socket at <[::1]:9618>
06/12/17 13:14:11 WARNING: Condor is running on a loopback address
06/12/17 13:14:11          of this machine, and may not visible to other hosts!
06/12/17 13:14:11 Daemoncore: Listening at <[::]:9618> on TCP (ReliSock) and UDP (SafeSock).
06/12/17 13:14:11 DaemonCore: command socket at <131.225.155.80:9615?addrs=131.225.155.80-9615+[--1]-9615&noUDP&sock=6877_62f2>
06/12/17 13:14:11 DaemonCore: private command socket at <131.225.155.80:9615?addrs=131.225.155.80-9615+[--1]-9615&noUDP&sock=6877_62f2>
06/12/17 13:14:11 In ViewServer::Init()

The excerpts are from the Frontend but the fix should go also on the Factory

History

#1 Updated by Marco Mambelli almost 3 years ago

  • Status changed from New to Feedback
  • Assignee changed from Marco Mambelli to Parag Mhashilkar
  • Priority changed from Normal to Urgent

Adding the line:

COLLECTOR.USE_SHARED_PORT=False

in both /etc/condor/config.d/01_gwms_collectors.config and 01_gwms_factory_collectors.config
seems to fix the problem. Tested in fermicloud, the error disappeared.

This may be a workaround to a condor bug. Anyway it is needed with 8.6 and is not harmful (redundant at the most) with other versions.

Changes are in v3/16810

#2 Updated by Marco Mambelli almost 3 years ago

Tim Theisen reminded the comments of Brian about [#10745] suggesting to move to shared port for everything.

#3 Updated by Parag Mhashilkar almost 3 years ago

  • Assignee changed from Parag Mhashilkar to Marco Mambelli

Looks ok to merge. Though I wonder how useful this is if you are going to move to shared port as well.

#4 Updated by Marco Mambelli almost 3 years ago

  • Status changed from Feedback to Resolved

#5 Updated by Marco Mambelli over 2 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF