Project

General

Profile

August 15th

Date:
07/15/2017

Attendees:
Anna , Brandon, Brian, Marc, Margherita, Vladimir, Robert, Joe, Yuyi

Agenda:
- v2_2_0 Status (All)
- New layout (Vladimir)
- FIFEMON/KIBANA plots/stats (Joe)
- New NOvA request (Anna)
- MU2E request to be on-board (Anna)
- AOB

Discussion:

ANNA:
Welcome to Yuyi, joining Poms Team!

1. v2_2_0 Status

ANNA:
The deployment is scheduled in a week but because of the many changes,
I'd like to pass it by the exp first.
Please update task status since the version only shows 52% done.

YUYI:
Starting working on some old and new issues.

JOE:
Updated Grafana plots.
Some campaigns still give some problems.

ANNA:
Show the different sites where jobs run.
Make sure there is a link on campaign page to go to Kibana to show campaign metrics. Not sure if we want to maintain that.
Also add Grafana link on the page.

VLADIMIR:
About link to SNOW: it should be easy to add.

ANNA:
Dune will have a separate portal so the link could be different from the rest of the exp.

2. New layout

VLADIMIR:
New layout for campaign is done (showing slides).
Now main page shows Name, Active status,Creator an Created. Added filtering and sorting.
Clicking on a campaign you go to the page with full details as before.
Also ability to mark filteres campaign as active/inactive.

ANNA:
What if experiments are not happy with the changes?
WE know that this changes are improving performance and easy access, so we need to push for such changes, they are needed.

MARC:
Maybe we could also add tags in the display so exp could filter on it.

ANNA:
Yes and maybe be we can add the possibility to show statistics for a bunch of selected campaigns, not for one only as it is now in this implementation.
Can this be done for next release?
Let's check next meeting.
No more proxy errors after Steve added more web servers, so far.

3. New NOvA request

ANNA:
Run time and Transfer time histograms (SNOW request 582786).

MARC:
Run time should be easy. Transfer time might be harder because it could be unreliable..

ROBERT:
Should this be done in Landscape?

MARC:
WE can try to do it first and then ask Landscape people to do it.

ANNA:
Looking into a tool that Alice is using. It's a tool where production group can follow transfer files,
status; something like FTS page but more condensed and with summary view.
This will NOT be in our next release... We will need a separate meeting to go over the details.

4. MU2E request to be on-board

ANNA:
Met with Ray Culbertson to show him POMS and how to compose campaigns. Their workflow seems to be
similar to ProtoDUNE. Ray wants to submit to OSG not only GPGrid.
They also need the recovery feature.. WE talked about recovery types and the ongoing discussion with the Distributed Computing team to increase
memory size and release jobs; it seems a feature they like. This was done for MCC9 manually by me. But for MCC8 was not possible because there were a lot (thousands)
of held jobs on the grid so I had to remove them after agreement with DUNE.
Ray will start doing some tests.

MARC:
WE need to figure out permissions and who can run jobs.

5. AOB

ANNA:
There is something we discussed long time ago, having a link to FIFEMON Calendar for outages, etc.
Also include SNOW Calendar.

YUYI:
Working on updating documentation on POMS installation.

JOE:
There is a new cluster for Elastic Search; POMS will need to update URL.

MARC:
There is a SNOW ticket: issue is to keep up with updated with jobs. One of the problems was getting
duplicate keys error. Maybe we need to lock some related table (like tasks), while we are doing updates.
Operations are serialized so we are not getting good performance.
Needed to make parallelize process base on task id. Created 8 separate buckets and this should speed up
the process.
This will be available in next release.