Project

General

Profile

Feature #22866

Create dynamic pages serving the Glideins stdout, stderr and included content

Added by Marco Mambelli about 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
07/04/2019
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

From the preliminary observation seems that
The factory stores Glidein log files in a directory tree starting at /var/log/gwms-factory/client/
The tree has a folder per Frontend user, each with a folder per Factory ID.
These folders have multiple folders named entry_AAA, each with the log files for the Glideins sent to the corresponding AAA entry.
These include HTCondor logs (condor_activity... , one per day) and the stdout and stderr from the Glideins (job.NNN.MM.out and .err, where NNN and MM are the condor cluster and process ID for the job, a counter always incremented for a given schedd, so one entry will have non-consecutive unique numbers)
stdout/err contain structured information including the results of tests, Unix environment, and the HTCondor log files
Tools to extract the condor logs are in GWMS: factory/tools/gwms-logcat.sh, cat_logs.py, cat_MasterLog.py, cat_... , cat_XMLResult.py

The tree should be copied in a new location to avoid crowding the Factory drive and because the factory periodically purges the files.
Compression of the stdout/err files seems effective and should be considered (e.g tar.gz files with stdout/err from a job could be created)
The de-compression and decoding of the stdout/err files should happen client-side to keep low the server load
The files should be served in a secure way, e.g. authenticating the users with the x509 certificates, username/password or SSO.


Related issues

Blocks glideinWMS - Feature #22848: Improve Glidein monitoring and troubleshootingNew2019-06-28

History

#1 Updated by Marco Mambelli about 2 months ago

  • Blocks Feature #22848: Improve Glidein monitoring and troubleshooting added

#2 Updated by Marco Mascheroni about 2 months ago

The tree should be copied in a new location to avoid crowding the Factory drive and because the factory periodically purges the files.

With this you mean rsync to another machine, right? Because IMHO rsynch to a new location on the same machine just creates overload.



Also available in: Atom PDF