Project

General

Profile

Task #14468

Monitoring/OnDemand: Check network rates on squid on way up

Added by Steven Timm over 4 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Start date:
11/10/2016
Due date:
% Done:

90%

Estimated time:
Duration:

Description

Look at cpu and network rates of the squid servers.. see when we get to 10 are they pegged and do we really need more?
use iftop, and also the network monitoring stuff in the google console

History

#1 Updated by Neha Sharma over 4 years ago

All squid instance groups were running at their max capacity.

CPU utilization graphs are as follows-

https://console.cloud.google.com/compute/instanceGroups/details/us-central1-a/sc-demo-us-central1-a?project=fermilab-poc&graph=GCE_CPU&duration=PT1H
https://console.cloud.google.com/compute/instanceGroups/details/us-central1-b/sc-demo-us-central1-b?project=fermilab-poc&graph=GCE_CPU&duration=PT1H
https://console.cloud.google.com/compute/instanceGroups/details/us-central1-c/sc-demo-us-central1-c?project=fermilab-poc&graph=GCE_CPU&duration=PT1H
https://console.cloud.google.com/compute/instanceGroups/details/us-central1-f/sc-demo-us-central1-c?project=fermilab-poc&graph=GCE_CPU&duration=PT1H

Here is some captured iftop output

Instance group is resizing. 9 instances, scaling to 10. sc-demo-us-central1-a

[root@sc-demo-us-central1-a-g58f sekhrineha]# hostname
sc-demo-us-central1-a-g58f
[root@sc-demo-us-central1-a-g58f sekhrineha]#

246Mb                  491Mb                  737Mb                  982Mb            1.20Gb
└──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────
10.128.0.18 => condor-c0ff430d-7907-47b7-8489-984b9a85fa7 46.7Mb 47.0Mb 46.7Mb
<= 184Kb 150Kb 270Kb
10.128.0.18 => condor-d999a297-9953-472d-a567-e489ac50520 39.7Mb 41.4Mb 40.9Mb
<= 453Kb 242Kb 139Kb
sc-demo-us-central1-a-g58f.c.fermilab-poc. => cvmfs.fnal.gov 162Kb 110Kb 129Kb
<= 39.1Mb 27.4Mb 30.6Mb
10.128.0.18 => condor-7ca4bd1b-9ae0-459f-beb0-0920ab525c3 27.0Mb 21.2Mb 5.29Mb
<= 82.3Kb 69.3Kb 17.7Kb
10.128.0.18 => condor-cd148d85-bb84-468d-ad16-27ab0e49c3d 51.8Mb 15.5Mb 3.89Mb
<= 183Kb 57.7Kb 14.4Kb
10.128.0.18 => condor-9312edaa-15c5-4474-91e3-9bb4115bc33 0b 15.5Mb 4.97Mb
<= 0b 40.5Kb 14.2Kb
10.128.0.18 => condor-8624ca52-db0e-4746-b2b6-b0cc393a0f2 0b 12.8Mb 3.21Mb
<= 0b 34.1Kb 8.53Kb
10.128.0.18 => condor-74a240ef-34cc-4d63-b4cc-6536ff83e57 37.0Kb 11.6Mb 2.94Mb
<= 3.99Kb 66.2Kb 18.8Kb
10.128.0.18 => condor-d24c5cc4-2db8-4d58-a094-a962383b068 48.5Mb 9.69Mb 6.30Mb
<= 123Kb 24.6Kb 19.2Kb
10.128.0.18 => condor-aefef505-5ddf-408b-b8e7-5d71ea095b1 7.65Mb 9.10Mb 3.29Mb
<= 27.9Kb 27.5Kb 12.1Kb
10.128.0.18 => condor-d842cd7d-5525-4ea3-ac9a-69c1d18d01e 0b 6.01Mb 17.5Mb
<= 0b 70.7Kb 180Kb
sc-demo-us-central1-a-g58f.c.fermilab-poc. => front15.cern.ch 0b 63.3Kb 205Kb
<= 0b 5.96Mb 17.3Mb
10.128.0.18 => condor-3ac7e2b2-ced0-4754-96af-ce5e607a783 28.8Mb 5.82Mb 6.63Mb
<= 88.1Kb 22.7Kb 26.5Kb
10.128.0.18 => condor-9f4c3ed1-6006-446c-b726-fa2a87e09d4 24.9Mb 5.04Mb 5.94Mb
<= 75.6Kb 21.4Kb 31.1Kb
10.128.0.18 => condor-aaea7e18-af84-4a30-8105-925658b75aa 0b 3.56Mb 2.16Mb
<= 0b 9.21Kb 5.67Kb
10.128.0.18 => condor-6e9f7759-aa70-441c-bbfa-b3df924d479 265Kb 3.26Mb 5.95Mb
<= 17.0Kb 15.4Kb 26.2Kb
10.128.0.18 => condor-c681914f-fed0-44c9-9af5-62b6f158104 0b 1.08Mb 277Kb
<= 0b 10.4Kb 2.61Kb
10.128.0.18 => condor-6ffd331e-adff-41cf-8aa7-3304c32b5ca 160Kb 191Kb 5.14Mb
<= 11.6Kb 12.1Kb 18.7Kb
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
TX: cum: 0.98GB peak: 280Mb rates: 276Mb 209Mb 189Mb
RX: 247MB 185Mb 40.4Mb 34.3Mb 49.1Mb
TOTAL: 1.22GB 439Mb 317Mb 244Mb 238Mb
246Mb                  491Mb                  737Mb                  982Mb            1.20Gb
└──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────
10.128.0.18 => condor-2acffa68-9a54-4716-9875-0cb48fa0922 0b 13.1Mb 3.27Mb
<= 0b 35.8Kb 8.94Kb
10.128.0.18 => condor-d5196f16-2b77-400f-8faf-88f91b5bda0 0b 10.3Mb 15.2Mb
<= 0b 29.2Kb 736Kb
10.128.0.18 => condor-8624ca52-db0e-4746-b2b6-b0cc393a0f2 0b 5.09Mb 2.23Mb
<= 0b 13.1Kb 5.77Kb
10.128.0.18 => condor-0833a1f1-8cd9-4976-b121-a4438ac626e 0b 4.11Mb 2.01Mb
<= 0b 12.0Kb 6.21Kb
10.128.0.18 => condor-c0ff430d-7907-47b7-8489-984b9a85fa7 0b 3.20Mb 35.1Mb
<= 0b 35.4Kb 380Kb
sc-demo-us-central1-a-g58f.c.fermilab-poc. => cvmfs.fnal.gov 12.1Kb 47.6Kb 160Kb
<= 136Kb 1.12Mb 43.9Mb
10.128.0.18 => condor-d842cd7d-5525-4ea3-ac9a-69c1d18d01e 183Kb 896Kb 2.81Mb
<= 4.89Kb 23.1Kb 30.6Kb
sc-demo-us-central1-a-g58f.c.fermilab-poc. => front15.cern.ch 6.04Kb 30.4Kb 42.3Kb
<= 180Kb 884Kb 2.79Mb
10.128.0.18 => condor-9db6244c-73e6-4cba-b728-5430d335e7b 322Kb 174Kb 2.08Mb
<= 23.7Kb 12.2Kb 14.3Kb
10.128.0.18 => condor-9f4c3ed1-6006-446c-b726-fa2a87e09d4 90.7Kb 171Kb 126Kb
<= 5.89Kb 12.0Kb 8.74Kb
10.128.0.18 => condor-6ffd331e-adff-41cf-8aa7-3304c32b5ca 89.9Kb 139Kb 2.53Mb
<= 5.68Kb 8.99Kb 17.4Kb
10.128.0.18 => condor-6e9f7759-aa70-441c-bbfa-b3df924d479 77.6Kb 79.5Kb 73.8Kb
<= 4.19Kb 7.38Kb 5.58Kb
10.128.0.18 => condor-aefef505-5ddf-408b-b8e7-5d71ea095b1 82.8Kb 75.5Kb 58.3Kb
<= 4.80Kb 5.33Kb 3.90Kb
10.128.0.18 => condor-279e6a74-0c03-41ac-92c3-db90f90a949 0b 59.7Kb 97.9Kb
<= 0b 4.40Kb 7.11Kb
10.128.0.18 => condor-74a240ef-34cc-4d63-b4cc-6536ff83e57 95.1Kb 46.9Kb 51.4Kb
<= 6.49Kb 3.57Kb 4.11Kb
10.128.0.18 => condor-ab2e673b-8d77-4e1c-8bfe-eefe729d314 11.8Kb 46.6Kb 3.30Mb
<= 1.69Kb 3.32Kb 16.6Kb
10.128.0.18 => condor-d8c978df-6ae2-4bdb-8d04-a367943071d 32.9Kb 40.0Kb 57.1Kb
<= 2.30Kb 3.06Kb 4.30Kb
10.128.0.18 => condor-3ac7e2b2-ced0-4754-96af-ce5e607a783 12.1Kb 30.3Kb 84.4Kb
<= 1.69Kb 1.60Kb 5.70Kb
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
TX: cum: 1.83GB peak: 165Mb rates: 1.07Mb 37.7Mb 89.6Mb
RX: 743MB 122Mb 398Kb 2.21Mb 48.1Mb
TOTAL: 2.56GB 263Mb 1.46Mb 39.9Mb 138Mb

sekhrineha@sc-demo-us-central1-f-f81i ~]$ sudo su
[root@sc-demo-us-central1-f-f81i sekhrineha]# iftop
interface: eth0
IP address is: 10.128.5.249
MAC address is: 42:01:0a:ffffff80:05:fffffff9
[root@sc-demo-us-central1-f-f81i sekhrineha]#

──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────
sc-demo-us-central1-f-f81i.c.fermilab-poc. => cvmfs-srv.fnal.gov 35.9Kb 118Kb 110Kb
<= 509Kb 41.0Mb 27.4Mb
10.128.0.26 => condor-f937e377-2318-40d6-90eb-652e724531f 0b 33.4Mb 30.9Mb
<= 0b 153Kb 194Kb
10.128.0.26 => condor-4600ed45-9a5c-4825-84bf-2653ee47535 285Kb 17.7Mb 7.12Mb
<= 14.4Kb 98.6Kb 61.2Kb
10.128.0.26 => condor-c321b271-84cb-4e01-831e-5c99b8f7df9 0b 10.4Mb 3.47Mb
<= 0b 136Kb 45.3Kb
10.128.0.26 => condor-ccd20b50-1377-4e46-a458-fc634ad8441 52.2Kb 9.39Mb 4.38Mb
<= 5.68Kb 65.9Kb 48.1Kb
10.128.0.26 => condor-d942d30a-12c7-4569-8013-07427bc55d0 29.3Mb 5.87Mb 1.96Mb
<= 59.1Kb 11.9Kb 3.98Kb
10.128.0.26 => condor-b4a252c8-0020-4fff-8743-3ff3839d2aa 0b 5.87Mb 1.96Mb
<= 0b 11.3Kb 3.76Kb
10.128.0.26 => condor-d07f1b08-3117-478a-a265-66125e3aadc 0b 5.52Mb 1.84Mb
<= 0b 65.1Kb 21.7Kb
10.128.0.26 => condor-d120f50e-4e55-4cf3-ad91-640c48d68b3 25.5Mb 5.09Mb 2.98Mb
<= 58.4Kb 11.7Kb 6.74Kb
10.128.0.26 => condor-fc71e4f2-a9fa-47fc-b431-4a3f8182490 24.3Mb 5.03Mb 5.66Mb
<= 77.6Kb 26.7Kb 24.9Kb
10.128.0.26 => condor-d4fdbc68-3652-4b96-859f-0ae2c55fac1 74.1Kb 4.70Mb 3.57Mb
<= 4.40Kb 35.3Kb 32.8Kb
10.128.0.26 => condor-e4210a6f-3e4f-4ca9-8cf8-c2d2411f550 0b 4.13Mb 1.38Mb
<= 0b 40.3Kb 13.4Kb
10.128.0.26 => condor-efa1ea0b-f4e5-4bbc-8dad-3c95840dea6 0b 3.37Mb 8.65Mb
<= 0b 39.2Kb 103Kb
sc-demo-us-central1-f-f81i.c.fermilab-poc. => front15.cern.ch 68.8Kb 68.4Kb 67.7Kb
<= 390Kb 320Kb 360Kb
10.128.0.26 => condor-0d3ddf07-5242-455f-889d-44e24a75725 94.8Kb 181Kb 199Kb
<= 29.6Kb 29.7Kb 29.4Kb
10.128.0.26 => condor-d604d196-fa15-436e-82c9-e8d59ce98a0 328Kb 171Kb 193Kb
<= 30.3Kb 29.4Kb 29.1Kb
10.128.0.26 => condor-1f5cc2e2-eb1c-444d-a7c3-ba37a99037f 0b 49.6Kb 64.7Kb
<= 0b 2.38Kb 4.57Kb
10.128.0.26 => condor-adb43335-fc0e-4cda-b4c6-3ebc7b79eea 82.4Kb 40.3Kb 60.3Kb
<= 6.70Kb 3.44Kb 5.02Kb
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
TX: cum: 385MB peak: 186Mb rates: 80.1Mb 111Mb 103Mb
RX: 107MB 94.5Mb 1.18Mb 42.1Mb 28.5Mb

root@sc-demo-us-central1-a-skfj sekhrineha]# iftop
interface: eth0
IP address is: 10.128.7.66
MAC address is: 42:01:0a:ffffff80:07:42
[root@sc-demo-us-central1-a-skfj sekhrineha]# hostname
sc-demo-us-central1-a-skfj
[root@sc-demo-us-central1-a-skfj sekhrineha]#

246Mb                      491Mb                      737Mb                      982Mb                1.20Gb
└─────────────────────────┴──────────────────────────┴──────────────────────────┴──────────────────────────┴──────────────────────────
10.128.0.18 => condor-95531532-adcf-4af5-8c52-0bc3278032fa.c.fermil 64.4Mb 69.6Mb 66.4Mb
<= 815Kb 571Kb 441Kb
10.128.0.18 => condor-123995c8-5ae3-4661-8577-befeb708adfa.c.fermil 90.9Mb 56.0Mb 57.0Mb
<= 334Kb 1.44Mb 644Kb
sc-demo-us-central1-a-skfj.c.fermilab-poc.internal => cvmfs.fnal.gov 776Kb 701Kb 649Kb
<= 65.7Mb 42.6Mb 17.3Mb
10.128.0.18 => condor-9d68fd73-1924-4b9f-914a-e6c372276f56.c.fermil 16.7Mb 23.1Mb 5.77Mb
<= 41.5Kb 58.6Kb 14.7Kb
10.128.0.18 => condor-f7736585-a13c-4c49-a7f2-46af891a3203.c.fermil 0b 18.8Mb 4.71Mb
<= 0b 59.6Kb 14.9Kb
10.128.0.18 => condor-9e694d2b-f8fc-4d82-b778-fad229845246.c.fermil 12.2Mb 16.9Mb 4.22Mb
<= 257Kb 120Kb 29.9Kb
10.128.0.18 => condor-728385be-842e-47ee-b46b-7118ace63c3e.c.fermil 75.3Mb 15.1Mb 3.77Mb
<= 431Kb 86.1Kb 21.5Kb
10.128.0.18 => condor-cfbc3cca-eeab-4dcd-869f-412ce5c6a773.c.fermil 73.3Mb 14.7Mb 3.66Mb
<= 380Kb 76.0Kb 19.0Kb
10.128.0.18 => condor-02d2182a-7bb4-4dce-a823-7f7e9da2ba11.c.fermil 43.3Kb 7.46Mb 1.89Mb
<= 4.19Kb 42.8Kb 12.8Kb
10.128.0.18 => condor-b232d4fd-9e60-4d20-9c98-cdce62a50459.c.fermil 0b 1.45Mb 480Kb
<= 0b 977Kb 308Kb
10.128.0.18 => condor-36b79e58-2191-4cab-8c9c-f43c991803cb.c.fermil 0b 1.45Mb 480Kb
<= 0b 977Kb 308Kb
10.128.0.18 => condor-2a4f6ed3-8aae-4332-a731-7ba5f12683ff.c.fermil 1.54Mb 1.58Mb 2.97Mb
<= 231Kb 232Kb 248Kb
10.128.0.18 => condor-5ce5d03a-96d7-4474-8b29-803c0645e865.c.fermil 1.27Mb 1.53Mb 4.64Mb
<= 228Kb 231Kb 249Kb
10.128.0.18 => condor-d847e66e-1170-4cd8-b6c0-cfe234ba644b.c.fermil 452Kb 90.3Kb 22.6Kb
<= 923Kb 185Kb 46.1Kb
10.128.0.18 => condor-2259c964-ca8e-45e8-8773-ed8cbafe211f.c.fermil 109Kb 163Kb 231Kb
<= 29.9Kb 30.1Kb 30.8Kb
sc-demo-us-central1-a-skfj.c.fermilab-poc.internal => front08.cern.ch 32.7Kb 34.2Kb 35.7Kb
<= 92.7Kb 146Kb 214Kb
10.128.0.18 => condor-5acab265-56fc-4500-b3be-b32ee53c3724.c.fermil 0b 21.6Kb 5.41Kb
<= 0b 1.51Kb 386b
10.128.0.18 => condor-054b9cf1-af10-4db5-b4f6-12cf00d6ca36.c.fermil 0b 13.6Kb 3.39Kb
<= 0b 947b 237b
sc-demo-us-central1-a-skfj.c.fermilab-poc.internal => 74.125.72.33 7.98Kb 12.6Kb 10.1Kb
<= 208b 291b 1.90Kb
10.128.0.18 => condor-d47677ab-6090-4f6e-b79a-d78b79adf8df.c.fermil 0b 3.06Kb 784b
<= 0b 1.83Kb 468b
sc-demo-us-central1-a-skfj.c.fermilab-poc.internal => cmssrv244.fnal.gov 0b 3.49Kb 893b
<= 0b 1.32Kb 338b
sc-demo-us-central1-a-skfj.c.fermilab-poc.internal => cmssrv245.fnal.gov 0b 3.47Kb 1.08Kb
<= 0b 1.30Kb 413b
10.128.0.18 => condor-cdbdfc75-f2ec-4c45-ba0c-02253e8d9d6a.c.fermil 0b 2.19Kb 560b
<= 0b 1.29Kb 331b
10.128.0.18 => condor-d842cd7d-5525-4ea3-ac9a-69c1d18d01ed.c.fermil 0b 1.34Kb 343b
<= 0b 866b 217b
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
TX: cum: 1.72GB peak: 568Mb rates: 337Mb 229Mb 174Mb
RX: 553MB 69.4Mb 69.4Mb 47.7Mb 19.9Mb
TOTAL: 2.26GB 575Mb 406Mb 276Mb 194Mb

At this time (! 11:26 am), network bytes out is ~ 64MB/s and cpu utilization is 2%

#2 Updated by Steven Timm over 4 years ago

  • Tracker changed from Milestone to Task
  • Status changed from New to Work in progress
  • % Done changed from 0 to 90

#3 Updated by Steven Timm almost 4 years ago

  • Status changed from Work in progress to Resolved

We looked at this. It turned out that our selected scale-up rate was very agresssive, we never came close to saturating the network of any of our on-demand squid servers. we scaled up at 9MB/s, could have scaled at 5X that and been just fine.

#4 Updated by Steven Timm over 3 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF