jobsub_q prints wrong info, no error diagnostic
As mentioned by Ray in INC000000592035, jobsub_q output can miss information from a server. If a server can not be contacted, it must print a warning or error message, and the exit status should be non-zero. I was deceived by jobsub_q output into believing that all my jobs have completed, while in fact it just failed to show running jobs.
See below for an example of how to reproduce.
mu2egpvm05 ~$ while true; do jobsub_q > tmp.txt; st=$?; grep -q fifebatch2 tmp.txt && echo worked st=$st || echo broke st=$st; sleep 10; done worked st=0 broke st=0 worked st=0 worked st=0 worked st=0 worked st=0 worked st=0 worked st=0 worked st=0 worked st=0 worked st=0 broke st=0
#4 Updated by Dennis Box almost 4 years ago
doh branch 9976. Have been testing it on fifebatch-dev, killing a schedd manually and querying it with jobsub_q. The problem with condor_q -g is that it exits with the status of the last schedd that it checks, and I am not sure how it determines the order. Jobsub_q on the server now checks each schedd in turn with condor_q -name 'schedd_name' and throws and exits with non-zero status if any of them are bad.