jobsub constraint commands should respect --group even if --constraint doesn't include it: INCLUDING GLOBAL SUPERUSERS
In #17074, we requested that if --group and --user were specified to a jobsub client command along with --constraint, that the jobsub server would respect the former two and create a constraint to pass to condor that includes these. The changelog from that ticket's branch is here:
As we can see, --user is always respected, but if the user running the command is a global superuser, --group is not combined into --constraint. This means that a global superuser can run a command like jobsub_rm -G mygroup --constraint '"Only production jobs!"', and "Only production jobs!" will get applied across ALL jobsub groups. This happened yesterday afternoon (2019-03-07 16:53 on jobsub01 and jobsub02). The jobsub command was:
jobsub_rm --group=fermilab --constraint=POMS4_CAMPAIGN_STAGE_ID=1
The resultant condor command was:
condor_rm -totals -name jobsub01.fnal.gov -pool gpcollector03.fnal.gov -constraint POMS4_CAMPAIGN_STAGE_ID=1
And this, combined with the typo in the constraint (the "=" should have been "==") resulted in every single job in every jobsub group with the classad "POMS4_CAMPAIGN_STAGE_ID" defined getting deleted.
We want to beef this up in three ways:
1) When any user runs a jobsub_rm command, ask for confirmation. Perhaps we should have a force flag to allow tools like POMS to skip this
2) Default to respecting --group even if a jobsub user is a global superuser. If they say "jobsub_rm -G nova" it should only affect nova jobs UNLESS...
3) We should provide a special group (like "admin") so that a global superuser must consciously say that they do want to act on jobs across the board. So, let's say we want to remove all held jobs across all VOs for some reason. Currently, if the following were entered:
jobsub_rm -G nova --constraint "JobStatus=?=5"
this would remove EVERYONE's held jobs if the user running it were a global superuser. What we propose is something like:
jobsub_rm -G admin --constraint "JobStatus=?=5"
The jobsub server would naturally have to check to see if the user is actually a global superuser, and return an error if not.
#1 Updated by Shreyas Bhat 3 months ago
Our confirmation message should be something like "You are about to delete 8000 jobs. Are you sure you want to do this? (Y/n)"
A group "admin" might be confusing when we really mean "act globally". Perhaps it should be "global". We can discuss that in Tuesday's meeting.
#2 Updated by Shreyas Bhat about 1 month ago
The working document on this is here:
Two important changes:
- The flag for superusers is going to be --superuser
- You will NEVER be able to act on jobs in more than one group. If you pass -G (which is required), it will always be respected. If a superuser wants to execute commands on more than one group, they have to either type the same command multiple times with different groups, or just log onto the schedd and use condor commands. This is putting "safety" over convenience.