Feature #23091

Reliable, flexible and secure logging system for distributed workflows

Added by Marco Mambelli 10 months ago. Updated about 2 months ago.

Work in progress
Target version:
Start date:
Due date:
% Done:


Estimated time:
(Total: 0.00 h)


High throughput computing workflows run thousands of jobs on a variety of different resources: from commercial and on-prem clouds, to high performance computing centers, to remote or local clusters. The goal of this project is to provide an additional communication channel to retrieve information from these different resources and increase the reliability of the infrastructure. This will be added to GlideinWMS, a workflow manager leveraging the HTCondor software framework to provision resources for scientific computing. It will benefit all the collaborations using GlideinWMS, including the LHC experiment CMS, all the FIFE experiments at Fermilab, the HEPCloud portal and Open Science Grid.
GlideinWMS project:

This includes the following activities:
  1. Getting familiar with distributed computing and GlideinWMS
  2. Survey of the state of the Art and evaluation of remote application logging solutions (frameworks, libraries, formats)
  3. Critical review of the current format of the Glidein stdout/err
  4. Design a format for an additional logging stream that can be used by glidein_startup and other scripts within the Glidein (text, files forwarding)
  5. Build a simple system duplicating and transmitting stdout and stderr from the Glideins
  6. Design a system for many-to-many Glidein logging
    • Multiple Glideins sending messages, multiple subscribers may be interested
    • Globally Unique Glidein ID (to identify updates of the same files)
    • Useful metadata (e.g. factory/entry_set/entry, frontend/group, to identify who could be interested
    • Security consideration: authenticated messages, ...
  7. Development and integration related to distributed computing software for Grids, Clouds, and Supercomputers
  8. Testing on High-Performance Computers and clouds
  9. Integration in production
Some shortfalls of the current Glideins logging:
  • Reports only stdout and stderr
  • Missing stdout/err for some Glideins (especially killed ones)
  • Information only at the end (flush)
  • Not reporting to multiple listeners
  • Confusing or missing information from indirect and multi-job submissions

Consider also providing a critique of the current GlideinWMS software and suggestions to improve it, e.g. adding unit tests, linting, using specific libraries, ... Some of this is mentioned in #20901


Feature #23117: Additional logging channelWork in progressLeonardo Lai

Feature #23265: Add security mechanisms to the glidein logging channelWork in progressLeonardo Lai

Feature #24140: Implement and deploy a first version of the secure logging channelClosedMarco Mambelli


#1 Updated by Marco Mambelli 10 months ago

Some more details about item 2 above (logging solutions)

Some links about remote logging (solutions, discussions):

The streaming of the information and handling publishers and subscribers are central parts of logging, handled by streaming platforms and message queues:

Here a fragment that sends a custom log, in, using PHP in the receiving Web server:

function send_logs {
   debug_pilot_enabled=`grep '^DEBUG_PILOT' $glidein_config | awk '{print $2}'`
   if [ $debug_pilot_enabled"x" != "x" ]; then
       cp $PWD/../_condor_stderr $PWD/$pilot_id"_condor_stderr" 
       cp $PWD/../_condor_stdout $PWD/$pilot_id"_condor_stdout" 
       curl -F file=@$PWD/$pilot_id"_condor_stderr"
       curl -F file=@$PWD/$pilot_id"_condor_stdout"

Here the PHP script in the frontend:

   $filename=basename( $_FILES['file']['name']);
   preg_match('/T[0-9]+_[A-Z]+_[A-Z0-9]+/', $filename, $matches);
   $uploaddir = "/var/www/html/si_stuffs/debug_pilots/uploads/".$site_name."/";
   if (!file_exists($uploaddir)) {
       mkdir($uploaddir, 0755, true);
   $uploadfile = $uploaddir .$filename."_".$timestamp;
   move_uploaded_file($_FILES['file']['tmp_name'], $uploadfile);
   $log="Uploading: ".$uploadfile."\n";
   file_put_contents($log_file, $log, FILE_APPEND);
   #curl -F file=@$PWD/$LOG_FILE

#2 Updated by Marco Mambelli 10 months ago

  • Description updated (diff)

#3 Updated by Leonardo Lai 10 months ago

  • Start date changed from 08/08/2019 to 08/14/2019
  • Due date set to 08/14/2019

due to changes in a related task: #23117

#4 Updated by Leonardo Lai 9 months ago

  • Start date changed from 08/14/2019 to 09/12/2019
  • Due date set to 09/12/2019

due to changes in a related task: #23117

#5 Updated by Leonardo Lai 8 months ago

  • Status changed from New to Work in progress

#6 Updated by Marco Mambelli 8 months ago

  • Target version changed from v3_4_7 to v3_6_1

#7 Updated by Marco Mambelli 7 months ago

  • Target version changed from v3_6_1 to v3_7

#8 Updated by Marco Mambelli 3 months ago

  • Start date changed from 09/12/2019 to 03/06/2020
  • Due date set to 03/06/2020

due to changes in a related task: #23117

#9 Updated by Marco Mambelli 3 months ago

  • Target version changed from v3_7 to v3_7_1

Also available in: Atom PDF