Improve monitoring stats in glidefactory and glidefactoryclient classads
glidefactory and glidefactorystatus ClassAds contain monitoring information coming from the Factory startd (condor_q query).
This is stored as:
GlideinMonitorStatus... (in GFC for the specific client)
GlideinMonitorTotalStatus... (in GF, summary for the entry)
When there is interaction w/ clients the partial stats are calculated via subquery and the total is calculated doing the sum.
When there is no interaction w/clients the total is calculated from the list of jobs returned by condor_q for the entry (condorQ), see [#21525]
It may be convenient to calculate all the partial and total stats running once through the list of all the glidiens in the entry, doing all the calculations once.
This way all the monitoring info will be fresh and evaluated the same way.
Furthermore, the current method may leave some stale info if only some clients are interacting w/ one entry.
Some considerations before implementing:
- consider if the client name is all in the job (glidein) classad, without the need to check glidefactoryclient classads
- consider if the information is used within the same process
- evaluate the use of parallel workers
- think about the memory footprint
- do a benchmark to compare performance:
- trigger 1000 or more glideins, store the list of classads (will be useful also for unittests)
- calculate the stats w/ subqueries + total
- claculate all the stats in the new way
- compare memory usage and time
- evaluate the checks on the client names
- pay attention to the 2 stats dictionaries: client_stats (w/ client_int_name) and qc_stats (w/ client_log_name)