Idea #3384
Add plugin for APFMon
0%
Description
We've talked in the past about making pilot logs available to the VO.
ATLAS does this and published information upstream to a central monitoring service called APFMon.
We could potentially do this, the API is very simple:
https://svnweb.cern.ch/trac/panda/browser/panda-autopyfactory/current/autopyfactory/plugins/APFMonitorPlugin.py
Related issues
History
#1 Updated by Igor Sfiligoi almost 8 years ago
Do we have an idea of what it takes to run the "APFMon server"?
#2 Updated by Burt Holzman almost 8 years ago
No idea. I've asked Peter Love and John Hover for a link to the source -- I would want to have our own monitoring server configured and deployed before any work was even started on this.
#3 Updated by Burt Holzman almost 8 years ago
I haven't updated this in a while but most of the work is done. There's a few tasks left:
1. Bundle our own json module (maybe just copy simplejson), we don't want to introduce this dependency.
2. Remove two more hard-coded bits (the APFMon server destination and the Factory URL that we register with).
3. Redirect the curl stdout/stderr to /dev/null when the glideins registers
4. Document the new config options to enable this
5. Revert the logs back to condor_activity_* so the stats don't get messed up
6. Remove the link to the individual pilot log on the APFMon server
For 5/6, I thought I combine some javascript + mod_rewrite to grep through the condor_activity log for a given job, but I don't know
if this is worth it.
#4 Updated by Igor Sfiligoi almost 8 years ago
What about security?
I am not sure I am comfortable exporting the raw logs to the whole world.
#5 Updated by Burt Holzman almost 8 years ago
APFMon just gets links to the factory webserver, so it's factory webserver security that determines how available the logs are.
(For my test farm I just made a simple rewrite rule, but it's Apache, so we can a better rule in place).
That does bring up another APFMon problem server-side -- the protocol is inherently insecure, there's zero authentication. Anyone can register a factory under
any URL, any IP can change the pilot state for any pilot. Surprised this passed ATLAS scrutiny so far.
#6 Updated by Igor Sfiligoi almost 8 years ago
- Having world-readable logs is most likely a privacy problem for some OSG VOs (current and/or future) using the common factories.
- Short term, we should limit log access to FE admins only. The GF already has the DNs of the FEs, so we just need a tool to automate proper https configuration of Apache.
- Longer term, it would be nice to give sites access to logs that ran on their site(s). No obvious easy to handle solution at this point.
#7 Updated by Burt Holzman over 7 years ago
- Target version changed from v3_1 to v3_x
#8 Updated by Burt Holzman over 7 years ago
I think what's left to do here is to have the factory write an appropriate httpd.conf file.
I thought I wrote a script for this but I guess not. Here's something like what it could look like:
# This is the httpd conf file # GlideinWMS VOFrontend web configuration Alias /factory /var/lib/gwms-factory/web-area <Directory /var/lib/gwms-factory/web-area> Order allow,deny Allow from all </Directory> Listen 8320 https <VirtualHost *:8320> DocumentRoot /var/log/gwms-factory/client Alias /factory/logs /var/log/gwms-factory/client <Directory /var/log/gwms-factory/client> Order deny,allow Deny from all </Directory> SSLEngine On SSLCertificateFile /etc/grid-security/hostcert.pem SSLCertificateKeyFile /etc/grid-security/hostkey.pem SSLCACertificatePath /etc/grid-security/certificates RewriteEngine On RewriteRule (.*)/GLIDEIN/(.*) $1/glidein_gfactory_instance/$2 [R=301,L] <Directory /var/log/gwms-factory/client/user_frontend> SSLVerifyClient Require SSLVerifyDepth 5 SSLRequireSSL SSLOptions +FakeBasicAuth +StdEnvVars SSLRequire (%{SSL_CLIENT_S_DN} =~ m#^\/DC\=org\/DC\=doegrids\/OU\=People\/CN\=Burt\ Holzman\ 380062#) Order allow,deny Allow from all </Directory> </VirtualHost>
#9 Updated by Burt Holzman over 7 years ago
- Target version changed from v3_x to v3_2_x
#10 Updated by Marco Mambelli over 2 years ago
- Target version changed from v3_2_x to v3_4_x
#11 Updated by Marco Mambelli over 2 years ago
- Target version changed from v3_4_x to v3_5_x
#12 Updated by Marco Mambelli over 1 year ago
- Target version changed from v3_5_x to v3_7_x