Anna M, Brian Y, Marc M, Robert I, Brandon W, Vladimir, Yuyi, Margherita V.
- Feedback after POMS v2_2_0 release
1. Feedback after POMS v2_2_0 release and next
NO complaints after the new release.
In the meantime Marc already found 3 bugs, one already fixed.
1. Problem is the way we generate a new object everytime we check on each job
and this causes running out of memory.
2. Another problem is with bulk updates. We should lock the task id. then we should
order by task id so we always have the same order.
3. Another bug : jobs declared as "Located" but they were not.
Interesting that NOvA did not report that problem..
The other problem reported by GM2, crontab issue, is fixed.
Opened request to Jobsub team to increase memory size for jobs running on GRID.
It would be helpful if we could do that ourselves inside POMS.
We could increase memory and then release jobs that are on hold.
Dune, gm2 and mu2e they would be interested in having this recovery procedure.
Memory usage is also shown in Kibana plots. The info could be use possibly changing
parameters for next runs (ex MCC10) and also for memory increase everytime we see
a job being held.
We could try a first run, if job is held, increase memory and try again.
Another approach would be to kill the held jobs and restart with bigger memory.
Sometimes we will need more time for processing complicated events.
Example, for Dune reconstruction we already know some events can be weird, same for Mu2E.
Recovery types matter is very sensitive for experiments.
We should have different recovery types that experiments can use.
Another thing to think about would be to split datasets in a smart way.
It is my understanding that Ray (MU2E) is going to change the flow to use more SAM features.
I have a list of features to be discussed, but not now. First we need to take care of bug fixes.
A DUNE ongoing effort is to process data with a keepup process with POMS.
We will talk about that next meetings.
What is the status on POMs client?
It's ready for testing and it would be good if someone could run some
tests since there has been many changes.
Marc should send us set of tests we could try to submit to test the client side.
Another topic: what about DOCKER to install POMS development environment.
It's not really supported by the Division.
It would be an extra layer..
and not sure we would benefit from it..
It looks like if we have more developers who wants to join POMS , having Docker could be useful, not our case ...