Anna M, Brian Y, Steve W, Marc M, Robert I, Brandon W, Vladimir P. Yuyi G, Margherita VW.
- v2_3_0 update (All)
- v2_0_0 update (All)
1) Update on tasks for version v2_3_0
Please update your tasks status in Redmine.
18176: about adding new user: almost fixed
18194: about “hold reason from condor queue”: fixed in devel
18300: about file stats: Worked on ‘Delivered’, still need to work on “unknown”.
In the file counts , we need to add an “unknown counts” ; if files have an “unknown” status in SAM, they are not accounted for in our stats.
Issue about jobs in the queue: some go from “hold” to “complete”. When removed from queue they should be reported. This is confusing when looking at stats later on,let’s say a week later, because there are missing jobs stats. We need to have the right count even later.
“running” job is an actual condition. “Held” is a temporary condition and we might miss if they are removed.
How often do we check on status?
Hard to tell. A job gets marked removed from condor queue and there is a flag then. If we don’t see it in the queue we think it’s completed but it could have actually been removed. We need a way to tell the difference, need more information.
18361: another bug, about reset split sequence: Is this fixed?
17741: about tag/untag: Done.
17774: about SNOW link: Done.
17779: about campaign type: No work yet. Need to talk again with Marc.
Related to workflow for campaigns that depend on each other.
For some the output files are used as input for the other.
We want to distinguish on a campaign that is really “done” from ,for example,
Need to discuss in details, will have a separate meeting.
17859: about Roles in POMS: No update yet since last discussion.
Yuyi has been very busy with CMS which is her first priority.
Maybe we need to have some help for Yuyi, otherwise we delay the deployment of this version.
17894: about split list, which is done:
used this feature when mimicking files flow from CERN.
Split data sets based on number of files over 4 hours.
17895: about link for campaign info. Not done yet.
We would like to have more stats for the campaign.
17931: about hold/rel jobs button: Mostly done.
18365: about split type: working on it. (This was based on a request from MicroBoone).
Anything else for this release that is not in Redmine?
The only other question is about the ability to release held jobs with more memory.
They were going to give us ssh to the machine jobs are running.
Best would be to have a POMS account to access.
Need to discuss this with Ed Simmonds. Steve will contact him.
2) Update on v2_0_0, POMS Client
Any news on this?
Still need few testing.
Got feedback from users and some is opposite to others:
a) Protodune, they think of POMS is ok for production, for users is another layer to submit jobs, it’s more work, so they just use project.py
b) DUNE: they are happy to use it, we have done mcc9, mcc9.1, mcc9.2 and dc1.
Part of the problem is that there is a misunderstanding on what POMS is and what can do for the user, which is not simply submitting jobs.
It would be great if we could simplify more the setup for templates and submitting jobs.
For production, templates can be re-used, so it’s ok. But for normal jobs user submits, need to think if we could simplify.
Then there is the problem with DCache and persistend Dcache which is almost full all the time, users use persistent dCache where there is a bunch of old stuff.
Is there a quota per experiment?
No quota, some have dedicated, some share DCache.
We could hold jobs if we know there is no space..
And we could use the HOLD button in POMS to stop new submissions..
- Separate discussion on campaign types.
- Separate discussion on email types of subscriptions.
- Steve will contact Ed Simmons.