Project

General

Profile

Bug #20662

Failures when one collector is down

Added by Joe Boyd 7 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
08/22/2018
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

When gpcollector04 was down we were getting a high number of submission and jobsub_q failures. Below is a submission loop that prints out the number of seconds the submission took and then the time and jobsub jobid.

You can see that sometimes there will be a successful submission after 120 seconds (two of the 60 second timeouts because of the down collector) and other times there will be a

HTTP response:0 PyCurl Error 52: Empty reply from server

after just 60 seconds. Something is dealing well with the reply it's getting.

After Nick took the down collector out of the config everything started working so it is apparently related to the timeout.

jobsub_submit -G uboone -N 1 --resource-provides=usage_model=DEDICATED,OPPORTUNISTIC --generate-email-summary --disk=1GB --memory=1GB --cpu=1 --expected-lifetime=1h file://sleep_job.sh -f 15 -s 5 -e uboone

Submission took 121 seconds
1 Wed Aug 22 11:16:23 CDT 2018 JobsubJobId of first job:
Submission took 2 seconds
2 Wed Aug 22 11:16:26 CDT 2018 JobsubJobId of first job:
Submission took 122 seconds
3 Wed Aug 22 11:18:29 CDT 2018 JobsubJobId of first job:
Submission took 64 seconds
4 Wed Aug 22 11:19:34 CDT 2018 JobsubJobId of first job:
Submission took 62 seconds
5 Wed Aug 22 11:20:37 CDT 2018 JobsubJobId of first job:
HTTP response:0 PyCurl Error 52: Empty reply from server
Submission took 61 seconds
6 Wed Aug 22 11:21:39 CDT 2018 Submission took 3 seconds
7 Wed Aug 22 11:21:43 CDT 2018 JobsubJobId of first job:
Submission took 2 seconds
8 Wed Aug 22 11:21:46 CDT 2018 JobsubJobId of first job:
Submission took 5 seconds
9 Wed Aug 22 11:21:52 CDT 2018 JobsubJobId of first job:
Submission took 123 seconds
10 Wed Aug 22 11:23:56 CDT 2018 JobsubJobId of first job:
Submission took 1 seconds
11 Wed Aug 22 11:23:58 CDT 2018 JobsubJobId of first job:
Submission took 5 seconds
12 Wed Aug 22 11:24:04 CDT 2018 JobsubJobId of first job:
HTTP response:0 PyCurl Error 52: Empty reply from server
Submission took 61 seconds
13 Wed Aug 22 11:25:06 CDT 2018 HTTP response:0 PyCurl Error 52: Empty reply from server
Submission took 60 seconds
14 Wed Aug 22 11:26:07 CDT 2018 Submission took 1 seconds
15 Wed Aug 22 11:26:10 CDT 2018 JobsubJobId of first job:
Submission took 122 seconds
16 Wed Aug 22 11:28:13 CDT 2018 JobsubJobId of first job:
Submission took 3 seconds
17 Wed Aug 22 11:28:17 CDT 2018 JobsubJobId of first job:
Submission took 11 seconds
18 Wed Aug 22 11:28:29 CDT 2018 JobsubJobId of first job:
Submission took 62 seconds
19 Wed Aug 22 11:29:32 CDT 2018 JobsubJobId of first job:
HTTP response:0 PyCurl Error 52: Empty reply from server
Submission took 60 seconds



Also available in: Atom PDF