Pilot should reload grid environment between jobs
On Sept 6, 1013, OASIS went offline for ~24 hours. During that time any pilots that started prior to the outage and that were requested on behalf of VO's using OASIS, potentially acted as blackhole nodes. (Jobs failed and restarted continuously.)
To mitigate problems like these we should either:
a) reload the grid environment between jobs (e.g. this would allow a site to change the OSG_APP link from OASIS to something local without having all jobs fail on the pilot for the remaining lifetime of the pilot)
b) run all the validation scripts periodically and specifically check for OSG_APP availability and fail the pilot when something goes wrong.