Project

General

Profile

Task #8952

Task #8949: Migrate dCache store nodes monitoring from zabbix to check_mk

Prototyping one monitoring migration from zabbix to check_mk

Added by Chih-Hao Huang over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Start date:
07/23/2015
Due date:
07/30/2015
% Done:

100%

Estimated time:
8.00 h
Spent time:
Duration: 8

Description

Just pick one to prototype.
This includes understanding zabbix and check_mk, writing the necessary scripts and deploying them.


Related issues

Follows (1 day) CMS dCache - Task #8950: Planning and figuraing out the scopeResolved06/12/201506/17/2015

Precedes (1 day) CMS dCache - Task #8953: Systematically migrating all alrams from zabbix to check_mkAccepted08/03/201510/23/2015

History

#1 Updated by Chih-Hao Huang over 4 years ago

  • Related to Task #8950: Planning and figuraing out the scope added

#2 Updated by Chih-Hao Huang over 4 years ago

  • Related to deleted (Task #8950: Planning and figuraing out the scope)

#3 Updated by Chih-Hao Huang over 4 years ago

  • Follows Task #8950: Planning and figuraing out the scope added

#4 Updated by Chih-Hao Huang over 4 years ago

  • Precedes Task #8953: Systematically migrating all alrams from zabbix to check_mk added

#5 Updated by Chih-Hao Huang over 4 years ago

  • Due date changed from 06/01/2015 to 06/07/2015

#6 Updated by Chih-Hao Huang over 4 years ago

  • Due date changed from 06/26/2015 to 07/30/2015
  • Status changed from New to Assigned
  • Start date changed from 06/19/2015 to 07/23/2015
  • % Done changed from 0 to 20

Read documentation at https://mathias-kettner.de/checkmk.html
Use cmsstor115 as test.
Created a few dummy checks.
Learned to re-inventory and activate the new checks.
Literally knocked out "Check_MK Discovery" on cmsstor115 with "UNKNOWN - Invalid line 35 in autochecks file /omd/sites/dcsomon/var/check_mk/autochecks/cmsstor115.mk".
Did the same to cmsstor113.
Stop here.

#7 Updated by Chih-Hao Huang over 4 years ago

In one of the dummy checks, I had an "'" in the service name.
I "think" that's how it was killed.
However, it needs server access to fix it.
Will ask Tim for help.

#8 Updated by Chih-Hao Huang over 4 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 20 to 100

clean up the mess created last time.
tried one script that have multiple outputs and it works.
prototype a check for dcache-service on cmsstor115, basically, it is simply

dcache status | grep v DOMAIN | awk '{if ($2 == "running") print 0,$1,"",$2; else print 1,$1,"-",$2}'

Could have a generic check based on "service --status-all" to cover all services.



Also available in: Atom PDF