Project

General

Profile

Bug #6768

Error classad creation code does not always advertise all the relevant attributes

Added by Igor Sfiligoi over 5 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Igor Sfiligoi
Category:
-
Target version:
Start date:
08/08/2014
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:

CMS

Duration:

Description

I have noticed that several glidein attributes often are missing from the error classads;
for example GLIDEIN_Entry_Name and GLIEIN_Site.

Without them, it is close to impossible to figure out which glidein they are coming from.


Related issues

Related to GlideinWMS - Idea #6787: No good reason for having any default values in condor_vars.lst.entryClosed08/12/2014

History

#1 Updated by Igor Sfiligoi over 5 years ago

Here are a couple example incomplete Error ClassAds:

$ condor_status glidein_21962@compute-22-32.tier2 -l |grep -e '^Name' -e '^GLIDE' -e 'State' |sort
EnteredCurrentState = 1407537622
GLIDECLIENT_Group = "main" 
GLIDECLIENT_Group_Signature = "08c56fcc31727d0351217045905bce987edd62bf" 
GLIDECLIENT_Name = "UCSD-v1_0.main" 
GLIDECLIENT_ReqNode = "glidein.grid.iu.edu" 
GLIDECLIENT_Signature = "586cb1b2afec861d203ad802514493af15a3cc01" 
GLIDEIN_ADVERTISE_ONLY = 1
GLIDEIN_COLLECTOR_NAME = "glidein-collector-2.t2.ucsd.edu:9747,glidein-collector.t2.ucsd.edu:9747" 
GLIDEIN_CredentialIdentifier = "160364" 
GLIDEIN_Description_File = "description.e79lVc.cfg" 
GLIDEIN_EXIT_CODE = 1
GLIDEIN_Expire = 1407538507
GLIDEIN_Expose_Grid_Env = "True" 
GLIDEIN_Factory = "OSGGOC" 
GLIDEIN_Failed = true
GLIDEIN_FAILURE_REASON = "Glidein failed while running wget. Keeping node busy until 1407538507 (Fri Aug  8 15:55:07 PDT 2014)." 
GLIDEIN_Glexec_Use = "OPTIONAL" 
GLIDEIN_LAST_SCRIPT = "wget" 
GLIDEIN_Max_Idle = 1200
GLIDEIN_MaxMemMBs = 2500
GLIDEIN_Max_Tail = 400
GLIDEIN_Monitoring_Enabled = "False" 
GLIDEIN_Name = "v3_0" 
GLIDEIN_Report_Failed = "ALIVEONLY" 
GLIDEIN_Req_MUPJ_gLExec = "False" 
GLIDEIN_REQUIRED_OS = "rhel5" 
GLIDEIN_Signature = "3070816cada4e1c0b14ed1132d02cfd1abb84e19" 
GLIDEIN_ToDie = 1407538507
GLIDEIN_ToRetire = 1407537622
Name = "glidein_21962@compute-22-32.tier2" 
State = "Drained" 

$ condor_status  glidein_63175@hansen-a060.rcac.purdue.edu -l |grep -e '^Name' -e '^GLIDE' -e 'State' |sort
EnteredCurrentState = 1407538904
GLIDECLIENT_Group = "main" 
GLIDECLIENT_Group_Signature = "08c56fcc31727d0351217045905bce987edd62bf" 
GLIDECLIENT_Name = "UCSD-v1_0.main" 
GLIDECLIENT_ReqNode = "vocms0305.cern.ch" 
GLIDECLIENT_Signature = "586cb1b2afec861d203ad802514493af15a3cc01" 
GLIDEIN_ADVERTISE_ONLY = 1
GLIDEIN_COLLECTOR_NAME = "glidein-collector-2.t2.ucsd.edu:10101,glidein-collector.t2.ucsd.edu:10101" 
GLIDEIN_CredentialIdentifier = "160364" 
GLIDEIN_Description_File = "description.e81h15.cfg" 
GLIDEIN_EXIT_CODE = 1
GLIDEIN_Expire = 1407539195
GLIDEIN_Expose_Grid_Env = "True" 
GLIDEIN_Factory = "CMS-CERN" 
GLIDEIN_Failed = true
GLIDEIN_FAILURE_REASON = "Glidein failed while running wget. Keeping node busy until 1407539195 (Fri Aug  8 19:06:35 EDT 2014)." 
GLIDEIN_Glexec_Use = "OPTIONAL" 
GLIDEIN_LAST_SCRIPT = "wget" 
GLIDEIN_Max_Idle = 1200
GLIDEIN_MaxMemMBs = 2500
GLIDEIN_Max_Tail = 400
GLIDEIN_Monitoring_Enabled = "False" 
GLIDEIN_Name = "v1_0" 
GLIDEIN_Report_Failed = "ALIVEONLY" 
GLIDEIN_Req_MUPJ_gLExec = "False" 
GLIDEIN_REQUIRED_OS = "rhel5" 
GLIDEIN_Signature = "4fc49dc8f5a31d25fa1923b9def726936fd35aa2" 
GLIDEIN_ToDie = 1407539195
GLIDEIN_ToRetire = 1407538904
Name = "glidein_63175@hansen-a060.rcac.purdue.edu" 
State = "Drained" 

#2 Updated by Igor Sfiligoi over 5 years ago

My current guess is that it is due to
GLIDEIN_Entry_Name and GLIDEIN_Site
being in
condor_vars.lst.entry,
which is loaded very late in the process.

I thus propose to move everything that is known at glidein startup from it into condor_vars.lst.
This would include GLIDEIN_Entry_Name and CONDORG_CLUSTER, but not GLIDEIN_Site.

#3 Updated by Igor Sfiligoi over 5 years ago

  • Status changed from New to Feedback
  • Assignee changed from Igor Sfiligoi to Parag Mhashilkar

Change implemented in branch v3/6768.

Please review.

#4 Updated by Parag Mhashilkar over 5 years ago

  • Status changed from Feedback to Resolved
  • Assignee changed from Parag Mhashilkar to Igor Sfiligoi

Reviewed and merged it to the release branch. If you want to move other attributes as well, go ahead.

#5 Updated by Parag Mhashilkar over 5 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF