Project

General

Profile

Bug #2185

Factory not updating proxy if limits hit

Added by Igor Sfiligoi about 8 years ago. Updated about 8 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
Factory
Target version:
Start date:
11/21/2011
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

I just found out that the factory is not renewing the proxies if the limits are hit, like in this example:
[2011-11-22T00:24:57+00:00 21966] Iteration at Tue Nov 22 00:24:57 2011
[2011-11-22T00:24:57+00:00 21966] Found 1 tasks to work on using existing factory key.
[2011-11-22T00:24:57+00:00 21966] Found 1 total tasks to work on
[2011-11-22T00:24:57+00:00 21966] WARNING: Entry igortest_cabinet-10-10-4 has hit the limit for idle glideins, cannot submit any more, skipping all requests
[2011-11-22T00:24:57+00:00 21966] Writing stats
[2011-11-22T00:24:57+00:00 21966] log_stats written
[2011-11-22T00:24:57+00:00 21966] qc_stats written
[2011-11-22T00:24:57+00:00 21966] rrd_stats written
[2011-11-22T00:24:57+00:00 21966] Sleep 60s

  1. the proxy was renewed 10 mins ago
    2011-11-22 00:23:43 UTC [root@glidein-itb:
    /glideinsubmit/glidein_v2_1_test/client_proxies/user_fecms/entry_igortest_cabinet-10-10-4]# ls lrt
    total 16
    -rw------
    1 fecms fecms 7123 Nov 15 11:24 x509_igortest_cabinet.minus,10.minus,10.minus,4@v2_1_test@ITBGOC@UCSD.minus,i5_2a.dot,itb_0.proxy.old
    rw------ 1 fecms fecms 7119 Nov 22 00:01 x509_igortest_cabinet.minus,10.minus,10.minus,4@v2_1_test@ITBGOC@UCSD.minus,i5_2a.dot,itb_0.proxy

As a test, I removed the limits in the factory/entry config, and the proxy was immediately renewed:
2011-11-22 00:26:52 UTC [root@glidein-itb:~/glideinsubmit/glidein_v2_1_test/client_proxies/user_fecms/entry_igortest_cabinet-10-10-4]# ls lrt
total 16
-rw------
1 fecms fecms 7119 Nov 22 00:01 x509_igortest_cabinet.minus,10.minus,10.minus,4@v2_1_test@ITBGOC@UCSD.minus,i5_2a.dot,itb_0.proxy.old
rw------ 1 fecms fecms 7139 Nov 22 00:27 x509_igortest_cabinet.minus,10.minus,10.minus,4@v2_1_test@ITBGOC@UCSD.minus,i5_2a.dot,itb_0.proxy

This is a pretty serious problem, potentially getting glideins held without any option for the VO to fix it.

Igor

History

#1 Updated by Krista Larson about 8 years ago

  • Assignee set to Krista Larson

#2 Updated by Krista Larson about 8 years ago

  • Status changed from New to Resolved

The factory now updates the proxies even if it can't submit any more glideins because of entry limits.

The factory now treats hitting limits the same as a downtime - the req idle is set to zero. This means the entry can still do any cleanup the frontend has requested. Also changed the sanitizeGlideins() call so that it is done once per iteration if no work (submit glideins or remove excess) has been done. It was skipped as well when there were problems with the limits or malformed classads.

Made the fixes in both v2+ (b57afb5eb3) and master (4c9d338c5) branches.

#3 Updated by Igor Sfiligoi about 8 years ago

Turns out, this was also affecting the monitoring;
Jeff noticed that we lost all monitoring when a limit was hit.

And it seems it is new to 2_5_3 (i.e. was not in v2_5_2).

This is actually quite a big problem...
what are the chances of having a release with this fix really soon?

Igor

#4 Updated by Parag Mhashilkar about 8 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF