Feature #20505: Consider if the blacklisting script from FIFE can be added as GlideinWMS frontend feature
Troubleshoot FIFE blacklist and period attribute
FIFE folks have a periodic script1 which evaluates (often as the period is set) if the node is in a blacklist which is housed in the central web server that Fermilab runs.
They have this script set with period=3600 (seconds)  which supposes to be executed every hour.
Scripts are all executed once before starting the HTCondor glidein. Periodic scripts are invoked also later, repeatedly according to the period specified (in seconds):
It seems the script is being executed only once (at the beginning) so we were asked to take a look to see if we can figure out what's happening.
<file absfname="/etc/gwms-frontend/scripts/blacklist.sh" after_entry="True" after_group="False" const="True" executable="True" period="3600" untar="False" wrapper="False"> <untar_options cond_attr="TRUE"/> </file>
#3 Updated by Lorena Lobato Pardavila over 1 year ago
- Target version set to v3_5
- Assignee set to Lorena Lobato Pardavila
It seems the script was creating an infinite loop here, namely when executing curl command and "sleep 60":
# Max 5 retries n=0 until [ $n -ge 5 ] do #Replace the next line with the real webserver/blacklist file curl -s --insecure $blacklist_url > $TMPFILE && break n=$[$n+1] sleep 60 done
After doing several tests, I believe to make it work, curl command should be established with a maximum time(it worked for me with --max-time=60 and sleep 60)of allowance of operation to take, for preventing your batch jobs from hanging for hours due to slow networks or links going down.
On the other hand, we also believe it may be affected by some interaction between the script and HTCondor. We'll talk to HTCondor team during the next meeting, to get further information about.
#4 Updated by Lorena Lobato Pardavila about 1 year ago
- Status changed from Work in progress to Resolved
Confirmed with HTCondor team that there is no hidden interaction between HTCondor and curl command that could affect in this case. They agreed that adding –max-time as suggested in our tests, could help to avoid the hanging of the curl command.
I resolve the ticket as no changes are needed on our side.