Feature #2454: Advertise classad in case of glidein failure
Error classads still not working in v3_1
Jeff tests show that the error classads still do not work.
The code committed in 2454 has major bugs in it.
#1 Updated by Igor Sfiligoi about 7 years ago
- Status changed from Assigned to Feedback
- Assignee changed from Igor Sfiligoi to Parag Mhashilkar
Fixed in branch_v3_2plus_igor_4657.
Basically, we were missing a few attributes early on in the glidein setup that were needed for error classad generation.
Please review, and I will merge it back to v3_2 and master.
#4 Updated by Igor Sfiligoi about 7 years ago
We have a major architectural problem here;
there is no good place to load the Condor binaries before all the FE scripts are run!
See the load procedure:
$ grep lst job.8397.0.err |grep file |grep Sign Signature OK for main:file_list.d95ng5.lst. Signature OK for client:preentry_file_list.d6afkj.lst. Signature OK for client_group:preentry_file_list.d9bh1Q.lst. Signature OK for client:aftergroup_preentry_file_list.d2phku.lst. Signature OK for entry:file_list.d95ng5.lst. Signature OK for client:file_list.d2phku.lst. Signature OK for client_group:file_list.d2phku.lst. Signature OK for client:aftergroup_file_list.d2phku.lst. Signature OK for main:after_file_list.d95ng5.lst.
The factory main scripts and files are loaded first and last only.
In the first section, I don't have the necessary info yet... condor version is often entry-specific.
And the last section is obviously too late, if we want to return useful info to the FE.
#8 Updated by Parag Mhashilkar about 7 years ago
- Assignee changed from Parag Mhashilkar to Burt Holzman
I reviewed it. This looks ok. Burt you wanted to try this out first on your setup? Assigning it to you before merging. Feel free to assign it back to me when you are done and I will take care of merging and tagging rc4
#10 Updated by Burt Holzman about 7 years ago
- Subject changed from Error classads still not workin gin v3_1 to Error classads still not working in v3_1
- Assignee changed from Burt Holzman to Parag Mhashilkar
I tested this branch (branch_v3_2plus_igor_4657, commit:8b82cd4) with all four locations (FE inside and outside group, factory inside and outside entry).
Glidein failed while running entry/factory-bad-validation-script.sh. Keeping node busy until 1380047320 (Tue Sep 24 18:28:40 UTC 2013). Glidein failed while running client_group/factory-bad-validation-script.sh. Keeping node busy until 1380047746 (Tue Sep 24 18:35:46 UTC 2013). Glidein failed while running main/factory-bad-validation-script.sh. Keeping node busy until 1380047093 (Tue Sep 24 18:24:53 UTC 2013). Glidein failed while running client/factory-bad-validation-script.sh. Keeping node busy until 1380047950 (Tue Sep 24 18:39:10 UTC 2013).
Looks good to me.