Project

General

Profile

September 26th

Date:
09/26/2017

Attendees:
Anna M, Brian Y, Steve W, Marc M, Robert I, Brandon W, Vladimir P, Yuyi G, Margherita VR.

Agenda:
- POMS v2_3_0 status
- Caching/Archiving (I hope Steve can attend this time)
- AOB

Discussion:

1. POMS v2_3_0 status

Feature #12195: "Brainstorming" about having a Have a "test" button.

ANNA:
When a production group wants to submit a bunch of jobs it would be good to
be able to submit 1 job to make sure all is configured correctly.
Also, another idea: should we have this type of test automatically done when there is
a submission of large amount of jobs? If we do test and the test fails, we should send email
to the production group (this should be configurable).

YUYI:
Preference would be to have a "pilot" job, have test automatically done.

STEVE:
Need to think of how to create the "test" job..

YUYI:
We could have a limit, if many job, then run the test otherwise no.

ANNA:
Need to keep in mind that many jobs of a campaign have same type of config parameters.

MARC:
If same config then we don't need to run test all the time.

ANNA:
Can we have the test button for the next release? Date has not been decided yet..

MARC:
We think so: we can have the popup window with submission info and we can change the default
number of jobs to run the test. This should be enough for what we want to do.

ANNA:
Related to this, idea to have a "hold" button. When many jobs are submitted, have the possibility
to hold the jobs for a certain amount of time, for example when resources are not available.

MARC:
We have a general hold.

ANNA:
Could we have a hold for a certain campaign?

MARC:
We need more details on what is held. What we do then when we release the jobs. Need to have
the knowledge of what to release or not.

ANNA:
"Show campaign page" : have a hold button there.

MARC:
We want to allow experiments to configure what to hold for, example if dCache is full.
Need to design what to do and how to do it: we watch some status elements. We could use this
information to automatically hold the campaign.
Email should be sent so the experiment will know.

ANNA:
Need to discuss few cases and decide to allow auto and/or manual intervention.

MARC/STEVE:
Need to have a flag in the DB to identify the campaign ID for the hold.
We should have a proposal for this and talk about it at another time.
Steve will do this in two weeks and Yuyi will work with him.

MARC:
First step is to decide how the interface should look like, then decide the implementation.


Feature #17741: another button to tag campaign(s).

ANNA:
"Add tag" feature is already in the campaign page,
but having it in the summary page would allow to tag a bunch at the same time.

STEVE:
We should tag the campaign when we create it. Issue of doing it on the summary page would be that
we might have too many actions on the page..

YUYI: maybe have a unique tag by default if people don't create one.

ANNA: the tag button was not there at the beginning of MCC9 campaign, so now there are many MCC9 campaigns not tagged.


Feature #17774

ANNA:
Simple, already discussed with Steve. In menu we have 2 links, open them in another tab.

STEVE:
Already done in devel version.

ANNA/STEVE:
We should have a submenu, called for example, "external links' and put under that
links to ECL logbook and SNOW.


Feature #14835 About Emails

ANNA:
Send report to the experiments if they ask for and maybe be able to set the level, like warnings etc.
We already have something in the CI for people to be added to email lists (check with Vladimir).

MARC:
Two levels of notifications: jobsub already does it; do we also need POMS to do that?

ANNA:
Question is "what do we want to notify users about?" ..Not sure we can add that in projects.py; if we do that
then what else could we add in POMS that is not in jobsub?
We can merge email at all kind of levels. If experiment want email that jobs are completed then POMS
will use jobsub information. When running MCC9 it was hard to know that individual steps were done.
It would be nice to have email through the workflow.
We need to decide on different types of email, we need to check with the experiments to know
what they want.


Feature #15938

ANNA:
Caching/archiving: we have discussed this to be developed in the future.

STEVE:
Nothing to say yet, we do need this because of the amount of data.

MARC:
Maybe for stats we can define a json blob so that we could extend with extra info if needed.

STEVE:
If using blobs we would need to parse it.

MARC:
Need to decide best way to do this.

ANNA:
So far it's ok, no alarms, but when more exp will run it will be even more data, so we need to
be prepared for that. Dune users will be submitting a lot of jobs, MC jobs, it won't be a steady
amount of jobs..

YUYI:
Question about what "archiving" really means.. compressing data?

STEVE:
Compress data , store stats info and then remove data. Ability also to store log files.

MARC:
There is an initial proposal for archiving on the wiki page.

STEVE:
We need to do this only if really needed ,it's a lot of work also if exp want to restore data.
First step is to have the stats info stored in another table. If they want details, job files are
just kept for 2 weeks but we will keep

3. AOB

ANNA:
Successfull test (jobs run with the configuration "MCC9 "between CERN and FNAL.

Action Items:

- Steve will prepare a proposal of hold cases and present it in two weeks.