Bug #11461
HPC: add check_mk monitoring for logwatch et al to all nodes
0%
Description
I am experiencing with bc1 and lfs7 (access from pi0s) to see what the content of /etc/check-mk-agent/logwatch.cfg should really be.
Then will have to do the following to enable logwatch:
I intend to enable the "netstat, smart and mk_logwatch" plugins on all nodes running check_mk now, I will do that by running:
rpm -q check-mk-agent-1.2.6p9-1.el6.x86_64 && (for i in netstat smart mk_logwatch; do /bin/cp -fp /usr/share/check-mk-agent/available-plugins/$i /usr/share/check-mk-agent/plugins/; done; /bin/cp -pf /usr/share/check_mk/agents/cfg_examples/logwatch.cfg /etc/check-mk-agent/)
I know of at least one server with a different version of check_mk (dslustre21) that I want to upgrade (with the same RPM version as all other nodes), I'd like to upgrade any other I find.
History
#1 Updated by Gerard Bernabeu Altayo about 5 years ago
The following could/should all be done as a puppet manifest, but I'm making it as a script that is quick&dirty to install check_mk:
#!/bin/bash base=/project/charmonium/checkmk statdir="$base/status" #If checkmk does not exist, install it, adding plugins and the proper config file. /bin/rpm -vUh $base/check-mk-agent*.rpm for i in netstat smart mk_logwatch; do /bin/cp -fp /usr/share/check-mk-agent/available-plugins/$i /usr/share/check-mk-agent/plugins/; done /bin/cp -pf $base/logwatch.cfg /etc/check-mk-agent/ ln -s /project/charmonium/checkmk/crondentry /etc/cron.d/check_mk_cron ln -s /project/charmonium/checkmk/checkmk_healthcomplement.pl /usr/share/check-mk-agent/local/checkmk_healthcomplement.pl
To check that the content is being updated one can check the $HOSTNAME file mtime or:
[root@ds0101 ~]# grep -A1 '<<<kernel>>>' /project/charmonium/checkmk/statdir/$HOSTNAME | tail -1
1453849793
That should be within 3600 (1h) of the output of `date +%s`.
Note that I am not properly handling the update of the logwatch.cfg file, but can be done from the crontab entry from the next post.
#2 Updated by Gerard Bernabeu Altayo about 5 years ago
I am setting this up in a slightly weird way because I don't have config mgment and this way it's easy to control, just for ds0101 for now:
[root@ds0101 ~]# cat /project/charmonium/checkmk/crondentry */10 * * * * root /usr/bin/check_mk_agent > /project/charmonium/checkmk/statdir/$HOSTNAME [root@ds0101 ~]# [root@ds0101 ~]# ln -s /project/charmonium/checkmk/crondentry /etc/cron.d/check_mk_cron
#3 Updated by Gerard Bernabeu Altayo about 5 years ago
I've used rgang to install the RPM and create the link on all DSG nodes:
[root@ds2 ~]# rgang dsgall /project/charmonium/checkmk/install.checkmk
Then used Ed's script plus some awk magic to add the nodes in checkmk:
[root@ds1 ~]# for i in `ls /project/charmonium/checkmk/statdir/`; do getent hosts $i | grep dsg | awk '{system("/project/charmonium/checkmk/server_add_host.py --server ecfmon1.fnal.gov --site hpcmon --host "$2" --ipaddress="$1" --folder \"staging\" --user automation --apikey ---------")}'; done
#4 Updated by Gerard Bernabeu Altayo about 5 years ago
Now I'm adding the DS WN:
I've used rgang to install the RPM and create the link on all DSG nodes: <pre> [root@ds2 ~]# rgang dsgall /project/charmonium/checkmk/install.checkmk </pre> Then used Ed's script plus some awk magic to add the nodes in checkmk: <pre> [root@ds1 ~]# for i in `ls /project/charmonium/checkmk/statdir/`; do getent hosts $i | grep dsg | awk '{system("/project/charmonium/checkmk/server_add_host.py --server ecfmon1.fnal.gov --site hpcmon --host "$2" --ipaddress="$1" --folder \"staging\" --user automation --apikey -----")}'; done </pre> And adding them with the API: <pre> [root@ds1 ~]# for i in `cat /usr/local/etc/farmlets/dsall`; do getent hosts $i | awk '{system("/project/charmonium/checkmk/server_add_host.py --server ecfmon1.fnal.gov --site hpcmon --host "$2" --ipaddress="$1" --folder \"staging\" --user automation --apikey ------")}'; done </pre>
#5 Updated by Gerard Bernabeu Altayo about 5 years ago
The nodes were flapping, I had to fix the dslusre21.fnal.gov ssh setup to allow more connections.
Amitoj wants to run more performance testing, until that happens I am removing all DS and DSG workernodes from check_mk, I've done it from the interface and now removing the cron:
[root@ds2 ~]# rgang dsall rm -f /etc/cron.d/check_mk_cron [root@ds2 ~]# rgang dsgall rm -f /etc/cron.d/check_mk_cron
Also removing the entries from dslustre21:
[root@dslustre21 ~]# rm -f /projectzfs/charmonium/checkmk/statdir/ds*
#6 Updated by Gerard Bernabeu Altayo about 5 years ago
Adding all Sergey nodes (5:20):
[root@ds1 ~]# pbsnodes | grep -B 4 sergey | grep ^ds | wc -l 130 [root@ds1 ~]# pbsnodes | grep -B 4 sergey | grep ^ds > /tmp/dswn.sergey.gba [root@ds1 ~]# wc -l /tmp/dswn.sergey.gba 130 /tmp/dswn.sergey.gba (reverse-i-search)`rgang': ^Cang --rshto=20 --rcpto=20 --rsh=/usr/bin/remsh --rcp=/usr/bin/rcp -C --nway=2 dsall /etc/group /etc/group [root@ds1 ~]# rgang /tmp/dswn.sergey.gba /project/charmonium/checkmk/install.checkmk
Now adding them in check_mk with the API:
[root@ds1 ~]# for i in `cat /tmp/dswn.sergey.gba`; do getent hosts $i | awk '{system("/project/charmonium/checkmk/server_add_host.py --server ecfmon1.fnal.gov --site hpcmon --host "$2" --ipaddress="$1" --folder \"staging\" --user automation --apikey -------")}'; done False Failed to add node (probably). Exiting due to errors. False Failed to add node (probably). Exiting due to errors. True True True True True
#7 Updated by Gerard Bernabeu Altayo about 5 years ago
I have updated the installation script to create the extra link to add the additional 'hpc_wn' sensors I made:
[root@ds1 ~]# cat /project/charmonium/checkmk/install.checkmk #!/bin/bash base=/project/charmonium/checkmk statdir="$base/status" #If checkmk does not exist, install it, adding plugins and the proper config file. /bin/rpm -vUh $base/check-mk-agent*.rpm for i in netstat smart mk_logwatch; do /bin/cp -fp /usr/share/check-mk-agent/available-plugins/$i /usr/share/check-mk-agent/plugins/; done /bin/cp -pf $base/logwatch.cfg /etc/check-mk-agent/ #Create the link for the extra sensors created (it is central so that we can change it easy while in testing) ln -s /project/charmonium/checkmk/checkmk_healthcomplement.pl /usr/share/check-mk-agent/local/checkmk_healthcomplement.pl #Create the link to the cron entry (it is central so we can change frequency et al easy while in testing) ln -s /project/charmonium/checkmk/crondentry /etc/cron.d/check_mk_cron #Make a run of the script. This way discovery and addition of the node will work right away. /usr/bin/check_mk_agent > /project/charmonium/checkmk/statdir/$HOSTNAME [root@ds1 ~]#
Applied the change to all nodes.
#8 Updated by Gerard Bernabeu Altayo almost 5 years ago
Installing on the CDMS nodes:
#!/bin/bash server=dslustre21.fnal.gov base=/projectzfs/charmonium/checkmk statdir="$base/status" #If checkmk does not exist, install it, adding plugins and the proper config file. pushd /tmp scp $server:$base/check-mk-agent*.rpm . /bin/rpm -vUh check-mk-agent*.rpm for i in netstat smart mk_logwatch; do /bin/cp -fp /usr/share/check-mk-agent/available-plugins/$i /usr/share/check-mk-agent/plugins/; done scp $server:$base/logwatch.cfg /etc/check-mk-agent/ scp $server:/etc/xinetd.d/check-mk-agent /etc/xinetd.d/check-mk-agent popd
Then I have had to add iptables rules:
[root@cdmsmicro ~]# grep check_mk /etc/sysconfig/iptables -A RH-Firewall-1-INPUT -s 131.225.161.42/32 -p tcp -m multiport --dports 6556 -m comment --comment "125 check_mk ecfmon1.fnal.gov" -j ACCEPT -A RH-Firewall-1-INPUT -s 131.225.240.87/32 -p tcp -m multiport --dports 6556 -m comment --comment "125 check_mk ecfmon2.fnal.gov" -j ACCEPT [root@cdmsmicro ~]#
#9 Updated by Gerard Bernabeu Altayo almost 5 years ago
To do it in bulk I used rgang:
[gerard1@ds1 ~]$ rgang --rsh=ssh -l root cdms.servers.list 'rpm -q check-mk-agent || yum install -y check-mk-agent-1.2.6p9-1.el5.x86_64.rpm ' [gerard1@ds1 ~]$ rgang --rsh=ssh -l root cdms.servers.list 'rpm -q check-mk-agent' cdmsz3= check-mk-agent-1.2.6p9-1.el6.x86_64 cdmsz2.cdms-soudan.org= check-mk-agent-1.2.6p9-1.el6.x86_64 cdmsz1= check-mk-agent-1.2.6p9-1.el6.x86_64 cdmstera2= check-mk-agent-1.2.6p9-1.el6.x86_64 cdmstera1= check-mk-agent-1.2.6p9-1.el6.x86_64 Warning: No xauth data; using fake authentication data for X11 forwarding. cdmsbitsy= check-mk-agent-1.2.6p9-1.el5.x86_64 cdmsitsy.cdms-soudan.org= check-mk-agent-1.2.6p9-1.el5.x86_64 cdmsgiga= check-mk-agent-1.2.6p9-1.el5.x86_64 cdmsmega.cdms-soudan.org= check-mk-agent-1.2.6p9-1.el5.x86_64 cdmsmicro= check-mk-agent-1.2.6p9-1.el5.x86_64 cdmsmini.cdms-soudan.org= check-mk-agent-1.2.6p9-1.el5.x86_64 cdmspico= check-mk-agent-1.2.6p9-1.el5 cdmsnano= check-mk-agent-1.2.6p9-1.el5 cdmsatto= check-mk-agent-1.2.6p9-1.el5 [gerard1@ds1 ~]$
Now all the nodees we care about are alive:
UP cdmsbitsy [Reschedule an immediate check] [View and edit parameters for this host] [Edit this host] cdmsbitsy 0 0 0 0 82 UP cdmstera1 [Reschedule an immediate check] [View and edit parameters for this host] [Edit this host] cdmstera1 0 0 0 0 61 UP cdmstera2 [Reschedule an immediate check] [View and edit parameters for this host] [Edit this host] cdmstera2 0 0 0 0 62 UP cdmsz1 [Reschedule an immediate check] [View and edit parameters for this host] [Edit this host] cdmsz1 0 0 0 0 84 UP cdmsz2.cdms-soudan.org [Reschedule an immediate check] [View and edit parameters for this host] [Edit this host] cdmsz2.cdms-soudan.org 0 0 0 0 33 UP cdmsz3 [Reschedule an immediate check] [View and edit parameters for this host] [Edit this host] cdmsz3 0 0 0 0 39
#10 Updated by Gerard Bernabeu Altayo almost 5 years ago
- Status changed from New to Resolved
opened https://cdcvs.fnal.gov/redmine/issues/11969 within the HPC redmine project to track this, closing.