Project

General

Profile

Bug #2817

Monitor startds don't show up in the user collector anymore

Added by Parag Mhashilkar over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
High
Assignee:
Douglas Strain
Category:
-
Target version:
Start date:
07/06/2012
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

Need to investigate why they disappeared

History

#1 Updated by Parag Mhashilkar over 8 years ago

Subject: Re: glideins monitor startds
Date: Fri, 6 Jul 2012 21:58:51 -0500

Found the problem, monitoring startds are crashing with following error message. Quickly glancing through v2.6 issues, I suspect either of the following issues may be causing this crash

https://cdcvs.fnal.gov/redmine/issues/2544
https://cdcvs.fnal.gov/redmine/issues/2541

07/06/12 21:45:48 (pid:30143) **********************************************
07/06/12 21:45:48 (pid:30143) * condor_startd (CONDOR_STARTD) STARTING UP
07/06/12 21:45:48 (pid:30143) *
/usr/local/osg-ce/OSG.DIRS/wn_tmp/glide_B25394/main/condor/sbin/condor_startd
07/06/12 21:45:48 (pid:30143) * SubsystemInfo: name=STARTD type=STARTD class=DAEMON
07/06/12 21:45:48 (pid:30143) *
Configuration: subsystem:STARTD local:<NONE> class:DAEMON
07/06/12 21:45:48 (pid:30143) * $CondorVersion: 7.6.2 Jul 14 2011 BuildID: 351672 $
07/06/12 21:45:48 (pid:30143) *
$CondorPlatform: x86_64_rhap_5 $
07/06/12 21:45:48 (pid:30143) * PID = 30143
07/06/12 21:45:48 (pid:30143) *
Log last touched 7/6 21:45:31
07/06/12 21:45:48 (pid:30143) **********************************************
07/06/12 21:45:48 (pid:30143) Using config source: /usr/local/osg-ce/OSG.DIRS/wn_tmp/glide_B25394/condor_config.monitor
07/06/12 21:45:48 (pid:30143) DaemonCore: command socket at <131.225.41.189:34655?noUDP>
07/06/12 21:45:48 (pid:30143) DaemonCore: private command socket at <131.225.41.189:34655>
07/06/12 21:45:48 (pid:30143) Setting maximum accepts per cycle 4.
07/06/12 21:45:48 (pid:30143) ZKM: Parsing map file.
07/06/12 21:45:48 (pid:30143) ZKM: 1: attempting to map '/DC=org/DC=doegrids/OU=Services/CN=fermicloud090.fnal.gov'
07/06/12 21:45:48 (pid:30143) ZKM: 2: mapret: 0 included_voms: 0 canonical_user: collector1
07/06/12 21:45:48 (pid:30143) ZKM: successful mapping to collector1
07/06/12 21:45:48 (pid:30143) CCBListener: registered with CCB server fermicloud090.fnal.gov:9619 as ccbid 131.225.155.37:9619#2558
07/06/12 21:45:48 (pid:30143) fgets failed
07/06/12 21:45:48 (pid:30143) "/usr/local/osg-ce/OSG.DIRS/wn_tmp/glide_B25394/main/condor/libexec/power_state ad" did not produce any output, ignoring
07/06/12 21:45:48 (pid:30143) VM-gahp server reported an internal error
07/06/12 21:45:48 (pid:30143) VM universe will be tested to check if it is available
07/06/12 21:45:48 (pid:30143) ERROR "Required attribute "PREEMPT" is not defined" at line 462 in file /home/condor/execute/dir_24541/userdir/src/condor_startd.V6/util.cpp

[...]

On Jul 6, 2012, at 7:23 PM, Parag Mhashilkar wrote:

Thanks. I thought too but haven't seen monitor startds for a while during testing. I thought something was amiss here so wanted to make sure.

I will try to trace it back.

[...]

On Jul 6, 2012, at 5:00 PM, Igor Sfiligoi wrote:

I don't remember we ever said we would disable them by default.

Igor

On 07/06/2012 02:43 PM, Parag Mhashilkar wrote:

I am having a memory lapse here, can some one please remind me if the
monitoring startds should be started by default by glideins?

#2 Updated by Parag Mhashilkar over 8 years ago

  • Assignee changed from Parag Mhashilkar to Douglas Strain

Doug, I did a quick test adding following to condor_config.monitor.include and monitor started showing up in the user pool

PREEMPT = (($(GLIDEIN_HOLD_CONDITION)) || ($(GLIDEIN_PREEMPT_CONDITION)))

Looks like #2541 & #2544 did not propagate to the monitor startds and there might be some other config changes for monitor that you may have to do.

#3 Updated by Douglas Strain over 8 years ago

I have added this in commit:a5acbe1 and pushed it to branch_v2plus. I can also confirm that this fixes the monitor startds not starting. Parag, want to quickly review and then cherry-pick over to branch_v2_6?

As requested, I will also verify that things happen correctly in other cases, such as DAEMON_SHUTDOWN, but I will add that in a separate commit if it is necessary.

#4 Updated by Douglas Strain over 8 years ago

  • Status changed from Assigned to Resolved

#5 Updated by Parag Mhashilkar over 8 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF