Add a Collector for glidein monitoring to the factory
Currently the factory does not have a good idea of which glideins are running and which are not;
it only relies on Condor-G information, which we all know is not very reliable.
So the proposal for this ticket is to add a factory-specific collector (tree) that all the glideins would report to.
This should be separate from the one used for classad exchange between the GF and the FEs.
I would also recommend installing it on a completely different node.
And it should be optional.
One of the reasons why we never did it in the past was due to authentication issues;
we do not want to track all the DNs that the FEs use for their pilots.
So the proposal for this ticket is to use the shared secret authentication for this collector.
Now, the shared secret will be pretty much impossible to get in a secure way to the WNs (e.g. GT2 does not provide any such guarantee);
the best we can hope for is making it non trivial to discover.
However, the information in the Collector are not really that sensitive.... it is just for monitoring, and as long people/services using this information understand that, it is still better than the current situation.
The reason to have any authentication at all is mostly to avoid random scanning services/script kiddies to compromise it.
#1 Updated by Igor Sfiligoi over 7 years ago
The exact nature of advertising to this collector should be carefully thought through.
Just pointing the current startd to it is likely not a good idea.
The collector is normally the most trusted service in a condor pool, so there are/may be some information the startd is sending that are not appropriate for a low security collector I am proposing.
#8 Updated by Igor Sfiligoi almost 6 years ago
I have created a sub-task ( #6770 ) that will only deal with the glidein configuration part.
Once that's done, the GF admins can start using it by manually maintaining the security of the GF collector.
Of course we do want automatic management of the security, but the two do not need to be implemented at the same time.