Improve Glidein monitoring and troubleshooting
Enhancement of the GlideinWMS's Glideins monitoring. Glideins send back to the Factories information very important to troubleshoot problems and understand the performance of the system. Easy access to this information allows service operators to manage efficiently hundreds of thousands of resources on the Grid and on the Cloud. Secure access avoids disclosing experiments information to unauthorized users.
This activity includes devising the best framework to export and present in a secure and efficient manner the log files and statistics provided by the Glideins. Developing a secure User Interface and a REST API to present the information.
- Evaluation, system integration and development using open source Web frameworks
- Should only log files be served?
- What is a fast and efficient storage for the log files? Should they be compressed on the fly?
- What should happen at client side and what at server side (e.g. uncompression and HTCondor log extraction
- Who can access? How can it be authenticated?
- plan a RESTful API
- Developments related to distributed computing software for Grids, Clouds and Supercomputers
There are already some efforts by Factory operations that should be included/coordinated:
- https://github.com/PanDAWMS (repos panda-bigmon-core and panda-bigmon-atlas)