Speeding up the FE matchmaking - pre-clustering
This is a proposal for speeding up the matchmaking in the FE.
Currently we are matching every job independently.
Instead, the proposal is to pre-cluster them, get the count, and just match the clusters.
(where cluster == jobs with the same values for the attributes used in matchmaking)
#4 Updated by Igor Sfiligoi over 6 years ago
I think I have found and fixed the problem.
My last patch was partial; only half of the function was clusterized :(
I have now finished the work, and clusterized what is missing.
Will commit to git tomorrow (strict firewall right now), but attaching the patch.
PS: I have applied the patch to the CMS AnaOps FE, and the matching time went from almost an hour to about 2mins ;)
#6 Updated by Douglas Strain over 6 years ago
- Status changed from New to Feedback
- Assignee changed from Douglas Strain to Igor Sfiligoi
If I understand this correctly, the code looks correct. The only functional comment is why multiple by nr_schedds on lines 275,280? I'm not sure I understand what's going on there.
However, this code badly needs some code comments, since it is extremely confusing and difficult to understand.
I think there should be some more comments, such as what data types each of these variables with examples. I think once that is done, it is okay to merge.
#8 Updated by Igor Sfiligoi over 6 years ago
I sympathize with the request for more comments, but don't have time right now to re-comment a large portion of the code.
If you have a request for a specific section of the patch that needs to be commented, i can attempt that... but not much more in the short term, sorry.
#11 Updated by Douglas Strain over 6 years ago
I must be in the Christmas spirit. I have gone in and commented most of this code.
Igor, can you do the following:
1) Quickly review my comments for correctness
2) Anywhere I have a "???", add in the relevant details.
2a) Fill in "???" for the undocumented parameters/types in the epydoc docstring below the function def
2b) Towards the middle of the function, I sort of lost track of the logic. All the manipulation of data structures lost me. Why do we need list_of_all_jobs which turns into (outvals_cl,jrange_cl). Why is it needed in that format? Can you put in a comment or two elaborating what's going on here? Thanks.
Hopefully, this should be less time-consuming than commenting all the code. Just fill in the blanks.
Once this is done, I think we can merge it.