Remove multiple glideins at a time
Right now we remove glideins one at a time. This seems quite inefficient
for a busy factory. After talking to Dan (see below), I think we can
make this efficient.
On 11/30/11 7:55 PM, Parag Mhashilkar wrote:
Thanks a lot for detailed explanation.
So looks like it depends on transaction succeeding or not and if there is any
failure marking the job as removed, even a single job, transaction will fail
and exit status will be non zero.
So I can still change to the removal to multiple glideins with one command and
in case it exits with non-zero I can revert to removing a job at a time.
Also, one key factor seems to be "marked for removal" v/s "actually removing"
Marking for removal can still mean actual removal can fail and the job will go
back to hold state. What happens in suck scenarios? I am guessing, the
behavior would be, one (or all) jobs are marked for removal, condor_rm returns
0, but actual removing of one of the jobs failed and only that particular job
goes to held state. Is my assumption correct?
From: Dan Bradley
Sent: Wednesday, November 30, 2011 4:52 PM
To: Parag Mhashilkar
Subject: Re: condor_rm question
On 11/30/11 3:55 PM, Parag Mhashilkar wrote:
Right now we remove glideins 1 condorg job at a time with condor_rm.
am wondering if doing a bulk remove has any side effects. Is
condor_rm 4 5 6
condor_rm -constraint "clusterid>3&& clusterid<7"
condor_rm -constraint "clusterid==3 || clusterid==5 || clusterid==6"
In above cases if removing 5 fails what happens to the status of 4&
The setting of job status to 'removed' happens in a transaction, so
either all jobs that can be removed will be marked for removal or none
of them will be. However, jobs that do not exist or which are already
marked for removal will be ignored.
Also what about the condor_rm exit status? How are constraints
with || when condor can remove jobs with certain constraints and
The exit status is 0 if at least one job was successfully removed. If
there is a communication failure or other failure to commit the
transaction, then no jobs will be successfully removed, so the exit
status will be non-zero. If the transaction is successfully committed
and this is communicated successfully to condor_rm, then the exit
status will be 0 and all matching jobs that existed and were not
already marked for removal will have been marked for removal. There is
no special treatment of constraints containing || clauses. All that
matters is whether the transaction was successfully committed, marking
all matching jobs for removal.