Feature #22848
Improve Glidein monitoring and troubleshooting
Start date:
06/28/2019
Due date:
% Done:
0%
Estimated time:
Stakeholders:
Description
Enhancement of the GlideinWMS's Glideins monitoring. Glideins send back to the Factories information very important to troubleshoot problems and understand the performance of the system. Easy access to this information allows service operators to manage efficiently hundreds of thousands of resources on the Grid and on the Cloud. Secure access avoids disclosing experiments information to unauthorized users.
This activity includes devising the best framework to export and present in a secure and efficient manner the log files and statistics provided by the Glideins. Developing a secure User Interface and a REST API to present the information.
- Evaluation, system integration and development using open source Web frameworks
- Should only log files be served?
- What is a fast and efficient storage for the log files? Should they be compressed on the fly?
- What should happen at client side and what at server side (e.g. uncompression and HTCondor log extraction
- Who can access? How can it be authenticated?
- plan a RESTful API
- Web development related to HTML, CSS, and JavaScript
- Developments related to distributed computing software for Grids, Clouds and Supercomputers
There are already some efforts by Factory operations that should be included/coordinated: ATLAS provides some similar features in its PANDA monitoring (see also attahced files):
- https://bigpanda.cern.ch/
- https://twiki.cern.ch/twiki/bin/view/PanDA/PanDA
- https://twiki.cern.ch/twiki/bin/view/PanDA/PandaMonitor
- https://twiki.cern.ch/twiki/bin/view/PanDA/BigPanDAmonitoring
- https://github.com/PanDAWMS (repos panda-bigmon-core and panda-bigmon-atlas)
- https://bigpanda.cern.ch//media/filebrowser/3af9d8d7-80c7-42d3-9f28-3891d93c6b19/panda/tarball_PandaJob_4401159482_ANALY_CYF/pilotlog.txt
Related issues
History
#1 Updated by Marco Mambelli over 1 year ago
- Related to Milestone #22673: Summer interns 2019 added
#2 Updated by Marco Mambelli over 1 year ago
- Blocked by Feature #22866: Create dynamic pages serving the Glideins stdout, stderr and included content added
#3 Updated by Marco Mambelli over 1 year ago
- Target version changed from v3_5_x to v3_6_1
#4 Updated by Marco Mambelli about 1 year ago
- Target version changed from v3_6_1 to v3_6_2
#5 Updated by Marco Mambelli about 1 year ago
- Category set to GlideinMonitor
#6 Updated by Marco Mambelli 9 months ago
- Target version changed from v3_6_2 to v3_6_3
#7 Updated by Marco Mambelli 8 months ago
- Target version changed from v3_6_3 to v3_6_4
#8 Updated by Marco Mambelli 4 months ago
- Target version changed from v3_6_4 to v3_6_5
#9 Updated by Marco Mambelli 3 months ago
- Target version changed from v3_6_5 to v3_6_6
#10 Updated by Marco Mambelli about 1 month ago
- Target version changed from v3_6_6 to v3_6_7