Adding interactive properties to the factory <-> frontend protocol
In my Factory ops experience, I am often asked to give the frontend admins some information about the factory;
e.g. the log of a completed glidein.
It would be nice if the factory service itself could provide such information.
Now, in theory, one could implement such a request through the existing channels;
the user requests the info by posting a ClassAd on the factory collector, a factory process reads it and posts back the result.
There are three problems with this:
1) latency... it may take a few minutes to get the answer back.
Could this be considered acceptable?
(e.g. It is still faster than email)
2) missing acknowledgement... neither the client nor the server know if and when the other side read the posted ClassAd
Could this be solved by simply having reasonable timeouts?
(or maybe we should look at how UDP-heavy network apps work)
3) correlation between request and reply... all requests in the current protocol are supposed to overwrite the previous ones
Would adding the timestamp of the (original) request solve the problem?
(what are the side effects?)
Bottom line, do you guys think that we can implement this with the current mechanisms,
or do we HAVE TO go back to the drawing board and come up with a completely new mechanism for this?
(with all the potential new complexities and problems that it may bring to the table)
#1 Updated by Burt Holzman almost 8 years ago
- Priority changed from Normal to Low
- Target version set to v3_1
I think before we get into a detailed (and interesting) implementation discussion, we should understand a bit better the use case.
I imagine frontend admins are interested in a few different classes of log information depending on what sort of problems they are seeing.
1. Access to logs where our validation scripts failed within a certain time window (or as far back as we have). On-demand? Or optionally always stream these logs back to the frontend?
2. Access to logs for pilots that never matched anything
3. Access to logs for a particular pilot where a particular pilot ran or failed to start
Anything else? Seems like we should consider using a transactional reliable protocol and not using the (already overloaded) collector as the information store. I think there are mature open source out-of-the-box solutions that can do this without us having to re-invent, but we should think about it.
#2 Updated by Igor Sfiligoi almost 8 years ago
The drawback of using anything but Condor is that the factory admins will have to maintain the security infrastructure of yet another product.
So one more thing that can go wrong.
Not saying we must stay with Condor, but we have to evaluate pros vs cons before we make a decision.
PS: I disagree that we need use cases before we decide about the protocol. I don't see how the use cases would influence the decision,
But I don't want to fight over this.