How detailed should we get in this requirement document? Everyone is encouraged to add to the requirements wiki that is broken down into sections.
We need a state diagram for the jobs.
Has different states:
-at campaign level
-at job level task
Campaign can have n batches. Keepup is ongoing, so technically a campaign may never complete. If a batch fails, retry it automatically. Also have a manual retry.
Sites having a particular category of jobs.
Campaigns have a job type.
These campaigns run successfully on these sites would be a nice metric.
Software type is attached to the campaign.
For monitoring, perhaps show:
-errors clumped by cpu type
-errors clumped by day of the week
Integrate system with GUMS (account mapping)...
Take the certificate and find out who it's mapped to. Not an easy solution.
Experiments have their own checks in their prelaunch, like is the dataset count greater than 0, and the software package exists in CVMFS.