Idea #3384

Add plugin for APFMon

Added by Burt Holzman over 7 years ago. Updated 10 months ago.

Factory Monitoring
Target version:
Start date:
Due date:
% Done:


Estimated time:


We've talked in the past about making pilot logs available to the VO.
ATLAS does this and published information upstream to a central monitoring service called APFMon.

We could potentially do this, the API is very simple:

Related issues

Related to GlideinWMS - Feature #2454: Advertise classad in case of glidein failureClosed04/18/2013


#1 Updated by Igor Sfiligoi over 7 years ago

Do we have an idea of what it takes to run the "APFMon server"?

#2 Updated by Burt Holzman over 7 years ago

No idea. I've asked Peter Love and John Hover for a link to the source -- I would want to have our own monitoring server configured and deployed before any work was even started on this.

#3 Updated by Burt Holzman over 7 years ago

I haven't updated this in a while but most of the work is done. There's a few tasks left:

1. Bundle our own json module (maybe just copy simplejson), we don't want to introduce this dependency.
2. Remove two more hard-coded bits (the APFMon server destination and the Factory URL that we register with).
3. Redirect the curl stdout/stderr to /dev/null when the glideins registers
4. Document the new config options to enable this
5. Revert the logs back to condor_activity_* so the stats don't get messed up
6. Remove the link to the individual pilot log on the APFMon server

For 5/6, I thought I combine some javascript + mod_rewrite to grep through the condor_activity log for a given job, but I don't know
if this is worth it.

#4 Updated by Igor Sfiligoi over 7 years ago

What about security?

I am not sure I am comfortable exporting the raw logs to the whole world.

#5 Updated by Burt Holzman over 7 years ago

APFMon just gets links to the factory webserver, so it's factory webserver security that determines how available the logs are.
(For my test farm I just made a simple rewrite rule, but it's Apache, so we can a better rule in place).

That does bring up another APFMon problem server-side -- the protocol is inherently insecure, there's zero authentication. Anyone can register a factory under
any URL, any IP can change the pilot state for any pilot. Surprised this passed ATLAS scrutiny so far.

#6 Updated by Igor Sfiligoi over 7 years ago

Summary of a chat between Igor and Burt:
  • Having world-readable logs is most likely a privacy problem for some OSG VOs (current and/or future) using the common factories.
  • Short term, we should limit log access to FE admins only. The GF already has the DNs of the FEs, so we just need a tool to automate proper https configuration of Apache.
  • Longer term, it would be nice to give sites access to logs that ran on their site(s). No obvious easy to handle solution at this point.

#7 Updated by Burt Holzman about 7 years ago

  • Target version changed from v3_1 to v3_x

#8 Updated by Burt Holzman almost 7 years ago

I think what's left to do here is to have the factory write an appropriate httpd.conf file.

I thought I wrote a script for this but I guess not. Here's something like what it could look like:

# This is the httpd conf file
# GlideinWMS VOFrontend web configuration

Alias /factory /var/lib/gwms-factory/web-area

<Directory /var/lib/gwms-factory/web-area>
    Order allow,deny
    Allow from all

Listen 8320 https
<VirtualHost *:8320>
DocumentRoot /var/log/gwms-factory/client
Alias /factory/logs /var/log/gwms-factory/client
 <Directory /var/log/gwms-factory/client>
    Order deny,allow
    Deny from all

SSLEngine On
SSLCertificateFile /etc/grid-security/hostcert.pem
SSLCertificateKeyFile /etc/grid-security/hostkey.pem
SSLCACertificatePath /etc/grid-security/certificates
RewriteEngine On
RewriteRule (.*)/GLIDEIN/(.*) $1/glidein_gfactory_instance/$2 [R=301,L]

 <Directory /var/log/gwms-factory/client/user_frontend>

        SSLVerifyClient Require
        SSLVerifyDepth 5
        SSLOptions +FakeBasicAuth +StdEnvVars
        SSLRequire (%{SSL_CLIENT_S_DN} =~ m#^\/DC\=org\/DC\=doegrids\/OU\=People\/CN\=Burt\ Holzman\ 380062#)

     Order allow,deny
     Allow from all

#9 Updated by Burt Holzman almost 7 years ago

  • Target version changed from v3_x to v3_2_x

#10 Updated by Marco Mambelli about 2 years ago

  • Target version changed from v3_2_x to v3_4_x

#11 Updated by Marco Mambelli almost 2 years ago

  • Target version changed from v3_4_x to v3_5_x

#12 Updated by Marco Mambelli 10 months ago

  • Target version changed from v3_5_x to v3_7_x

Also available in: Atom PDF