Project

General

Profile

Bug #11461

HPC: add check_mk monitoring for logwatch et al to all nodes

Added by Gerard Bernabeu Altayo over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Start date:
01/21/2016
Due date:
% Done:

0%

Estimated time:
Duration:

Description

I am experiencing with bc1 and lfs7 (access from pi0s) to see what the content of /etc/check-mk-agent/logwatch.cfg should really be.

Then will have to do the following to enable logwatch:

I intend to enable the "netstat, smart and mk_logwatch" plugins on all nodes running check_mk now, I will do that by running:

rpm -q check-mk-agent-1.2.6p9-1.el6.x86_64 && (for i in netstat smart mk_logwatch; do /bin/cp -fp /usr/share/check-mk-agent/available-plugins/$i /usr/share/check-mk-agent/plugins/; done; /bin/cp -pf /usr/share/check_mk/agents/cfg_examples/logwatch.cfg /etc/check-mk-agent/)

I know of at least one server with a different version of check_mk (dslustre21) that I want to upgrade (with the same RPM version as all other nodes), I'd like to upgrade any other I find.

History

#1 Updated by Gerard Bernabeu Altayo over 3 years ago

The following could/should all be done as a puppet manifest, but I'm making it as a script that is quick&dirty to install check_mk:

#!/bin/bash

base=/project/charmonium/checkmk
statdir="$base/status" 

#If checkmk does not exist, install it, adding plugins and the proper config file.

/bin/rpm -vUh $base/check-mk-agent*.rpm
for i in netstat smart mk_logwatch; do /bin/cp -fp /usr/share/check-mk-agent/available-plugins/$i /usr/share/check-mk-agent/plugins/; done
/bin/cp -pf $base/logwatch.cfg /etc/check-mk-agent/

ln -s /project/charmonium/checkmk/crondentry /etc/cron.d/check_mk_cron
ln -s /project/charmonium/checkmk/checkmk_healthcomplement.pl  /usr/share/check-mk-agent/local/checkmk_healthcomplement.pl 

To check that the content is being updated one can check the $HOSTNAME file mtime or:

[root@ds0101 ~]# grep -A1 '<<<kernel>>>' /project/charmonium/checkmk/statdir/$HOSTNAME | tail -1
1453849793
That should be within 3600 (1h) of the output of `date +%s`.

Note that I am not properly handling the update of the logwatch.cfg file, but can be done from the crontab entry from the next post.

#2 Updated by Gerard Bernabeu Altayo over 3 years ago

I am setting this up in a slightly weird way because I don't have config mgment and this way it's easy to control, just for ds0101 for now:

[root@ds0101 ~]# cat /project/charmonium/checkmk/crondentry 
*/10 * * * * root /usr/bin/check_mk_agent > /project/charmonium/checkmk/statdir/$HOSTNAME
[root@ds0101 ~]# 
[root@ds0101 ~]# ln -s /project/charmonium/checkmk/crondentry /etc/cron.d/check_mk_cron

#3 Updated by Gerard Bernabeu Altayo over 3 years ago

I've used rgang to install the RPM and create the link on all DSG nodes:

[root@ds2 ~]# rgang dsgall /project/charmonium/checkmk/install.checkmk

Then used Ed's script plus some awk magic to add the nodes in checkmk:

[root@ds1 ~]# for i in `ls /project/charmonium/checkmk/statdir/`; do getent hosts $i | grep dsg | awk '{system("/project/charmonium/checkmk/server_add_host.py  --server ecfmon1.fnal.gov --site hpcmon --host "$2" --ipaddress="$1" --folder \"staging\" --user automation --apikey ---------")}'; done

#4 Updated by Gerard Bernabeu Altayo over 3 years ago

Now I'm adding the DS WN:

I've used rgang to install the RPM and create the link on all DSG nodes:
<pre>
[root@ds2 ~]# rgang dsgall /project/charmonium/checkmk/install.checkmk
</pre>

Then used Ed's script plus some awk magic to add the nodes in checkmk:
<pre>
[root@ds1 ~]# for i in `ls /project/charmonium/checkmk/statdir/`; do getent hosts $i | grep dsg | awk '{system("/project/charmonium/checkmk/server_add_host.py  --server ecfmon1.fnal.gov --site hpcmon --host "$2" --ipaddress="$1" --folder \"staging\" --user automation --apikey -----")}'; done
</pre>

And adding them with the API:
<pre>
[root@ds1 ~]# for i in `cat /usr/local/etc/farmlets/dsall`; do getent hosts $i  | awk '{system("/project/charmonium/checkmk/server_add_host.py  --server ecfmon1.fnal.gov --site hpcmon --host "$2" --ipaddress="$1" --folder \"staging\" --user automation --apikey ------")}'; done
</pre>

#5 Updated by Gerard Bernabeu Altayo over 3 years ago

The nodes were flapping, I had to fix the dslusre21.fnal.gov ssh setup to allow more connections.

Amitoj wants to run more performance testing, until that happens I am removing all DS and DSG workernodes from check_mk, I've done it from the interface and now removing the cron:

[root@ds2 ~]# rgang dsall rm -f /etc/cron.d/check_mk_cron
[root@ds2 ~]# rgang dsgall rm -f /etc/cron.d/check_mk_cron

Also removing the entries from dslustre21:

[root@dslustre21 ~]# rm -f /projectzfs/charmonium/checkmk/statdir/ds*

#6 Updated by Gerard Bernabeu Altayo over 3 years ago

Adding all Sergey nodes (5:20):

[root@ds1 ~]# pbsnodes  | grep -B 4 sergey | grep ^ds | wc -l
130
[root@ds1 ~]# pbsnodes  | grep -B 4 sergey | grep ^ds  > /tmp/dswn.sergey.gba
[root@ds1 ~]# wc -l /tmp/dswn.sergey.gba
130 /tmp/dswn.sergey.gba
(reverse-i-search)`rgang': ^Cang --rshto=20 --rcpto=20 --rsh=/usr/bin/remsh --rcp=/usr/bin/rcp -C --nway=2 dsall /etc/group /etc/group
[root@ds1 ~]# rgang /tmp/dswn.sergey.gba /project/charmonium/checkmk/install.checkmk

Now adding them in check_mk with the API:

[root@ds1 ~]# for i in `cat /tmp/dswn.sergey.gba`; do getent hosts $i | awk '{system("/project/charmonium/checkmk/server_add_host.py  --server ecfmon1.fnal.gov --site hpcmon --host "$2" --ipaddress="$1" --folder \"staging\" --user automation --apikey -------")}'; done
False
Failed to add node (probably).  Exiting due to errors.
False
Failed to add node (probably).  Exiting due to errors.
True
True
True
True
True

#7 Updated by Gerard Bernabeu Altayo over 3 years ago

I have updated the installation script to create the extra link to add the additional 'hpc_wn' sensors I made:

[root@ds1 ~]# cat /project/charmonium/checkmk/install.checkmk 
#!/bin/bash

base=/project/charmonium/checkmk
statdir="$base/status" 

#If checkmk does not exist, install it, adding plugins and the proper config file.
/bin/rpm -vUh $base/check-mk-agent*.rpm
for i in netstat smart mk_logwatch; do /bin/cp -fp /usr/share/check-mk-agent/available-plugins/$i /usr/share/check-mk-agent/plugins/; done
/bin/cp -pf $base/logwatch.cfg /etc/check-mk-agent/

#Create the link for the extra sensors created (it is central so that we can change it easy while in testing)
ln -s /project/charmonium/checkmk/checkmk_healthcomplement.pl  /usr/share/check-mk-agent/local/checkmk_healthcomplement.pl
#Create the link to the cron entry (it is central so we can change frequency et al easy while in testing)
ln -s /project/charmonium/checkmk/crondentry /etc/cron.d/check_mk_cron

#Make a run of the script. This way discovery and addition of the node will work right away.
/usr/bin/check_mk_agent > /project/charmonium/checkmk/statdir/$HOSTNAME

[root@ds1 ~]# 

Applied the change to all nodes.

#8 Updated by Gerard Bernabeu Altayo over 3 years ago

Installing on the CDMS nodes:

#!/bin/bash

server=dslustre21.fnal.gov
base=/projectzfs/charmonium/checkmk
statdir="$base/status" 

#If checkmk does not exist, install it, adding plugins and the proper config file.

pushd /tmp
scp $server:$base/check-mk-agent*.rpm .
/bin/rpm -vUh check-mk-agent*.rpm
for i in netstat smart mk_logwatch; do /bin/cp -fp /usr/share/check-mk-agent/available-plugins/$i /usr/share/check-mk-agent/plugins/; done
scp $server:$base/logwatch.cfg /etc/check-mk-agent/
scp $server:/etc/xinetd.d/check-mk-agent  /etc/xinetd.d/check-mk-agent 
popd

Then I have had to add iptables rules:

[root@cdmsmicro ~]# grep check_mk /etc/sysconfig/iptables
-A RH-Firewall-1-INPUT -s 131.225.161.42/32 -p tcp -m multiport --dports 6556 -m comment --comment "125 check_mk ecfmon1.fnal.gov" -j ACCEPT 
-A RH-Firewall-1-INPUT -s 131.225.240.87/32 -p tcp -m multiport --dports 6556 -m comment --comment "125 check_mk ecfmon2.fnal.gov" -j ACCEPT 
[root@cdmsmicro ~]# 

#9 Updated by Gerard Bernabeu Altayo over 3 years ago

To do it in bulk I used rgang:

[gerard1@ds1 ~]$ rgang --rsh=ssh -l root cdms.servers.list 'rpm -q check-mk-agent || yum install -y check-mk-agent-1.2.6p9-1.el5.x86_64.rpm '

[gerard1@ds1 ~]$ rgang --rsh=ssh -l root cdms.servers.list 'rpm -q check-mk-agent'
cdmsz3= check-mk-agent-1.2.6p9-1.el6.x86_64
cdmsz2.cdms-soudan.org= check-mk-agent-1.2.6p9-1.el6.x86_64
cdmsz1= check-mk-agent-1.2.6p9-1.el6.x86_64
cdmstera2= check-mk-agent-1.2.6p9-1.el6.x86_64
cdmstera1= check-mk-agent-1.2.6p9-1.el6.x86_64
Warning: No xauth data; using fake authentication data for X11 forwarding.
cdmsbitsy= check-mk-agent-1.2.6p9-1.el5.x86_64
cdmsitsy.cdms-soudan.org= check-mk-agent-1.2.6p9-1.el5.x86_64
cdmsgiga= check-mk-agent-1.2.6p9-1.el5.x86_64
cdmsmega.cdms-soudan.org= check-mk-agent-1.2.6p9-1.el5.x86_64
cdmsmicro= check-mk-agent-1.2.6p9-1.el5.x86_64
cdmsmini.cdms-soudan.org= check-mk-agent-1.2.6p9-1.el5.x86_64
cdmspico= check-mk-agent-1.2.6p9-1.el5
cdmsnano= check-mk-agent-1.2.6p9-1.el5
cdmsatto= check-mk-agent-1.2.6p9-1.el5
[gerard1@ds1 ~]$ 

Now all the nodees we care about are alive:

UP     cdmsbitsy    [Reschedule an immediate check] [View and edit parameters for this host] [Edit this host]     cdmsbitsy    0     0     0     0     82
UP     cdmstera1    [Reschedule an immediate check] [View and edit parameters for this host] [Edit this host]     cdmstera1    0     0     0     0     61
UP     cdmstera2    [Reschedule an immediate check] [View and edit parameters for this host] [Edit this host]     cdmstera2    0     0     0     0     62
UP     cdmsz1    [Reschedule an immediate check] [View and edit parameters for this host] [Edit this host]     cdmsz1    0     0     0     0     84
UP     cdmsz2.cdms-soudan.org    [Reschedule an immediate check] [View and edit parameters for this host] [Edit this host]     cdmsz2.cdms-soudan.org    0     0     0     0     33
UP     cdmsz3    [Reschedule an immediate check] [View and edit parameters for this host] [Edit this host]     cdmsz3    0     0     0     0     39

#10 Updated by Gerard Bernabeu Altayo over 3 years ago

  • Status changed from New to Resolved

opened https://cdcvs.fnal.gov/redmine/issues/11969 within the HPC redmine project to track this, closing.



Also available in: Atom PDF