project-recovery returns incorrect dimension query
In sam_web_client v1_8 from /grid/fermiapp/products/common project-recovery returns an incorrect dimension query. My project (janzirn-UMN_JM_Transfer-20140805_1334) had 23 failures, all the files got marked correctly as skipped and after stopping the project I tried to create a new definition based on the project-recovery tool.
samweb -e nova project-recovery janzirn-UMN_JM_Transfer-20140805_1334 returned:
(snapshot_id 17017 minus (project_name janzirn-UMN_JM_Transfer-20140805_1334 and consumed_status consumed)) or consumer_process_id in (1089013,1089014,1089024,1089026,1089029,1089031,1089032,1089035,1089037,1089039,1089042,1089048,1089050,1089055,1089076,1089093,1089095,1089097,1089098,1089107,1089112,1089116,1089117)
Those are the 23 consumer_process_ids that threw errors, but when I created the definition and then listed the files I found that it had way too many files. I then created another definition, changing or consumer_process_id to and consumer_process_id, so the dimension query was:
(snapshot_id 17017 minus (project_name janzirn-UMN_JM_Transfer-20140805_1334 and consumed_status consumed)) and consumer_process_id in (1089013,1089014,1089024,1089026,1089029,1089031,1089032,1089035,1089037,1089039,1089042,1089048,1089050,1089055,1089076,1089093,1089095,1089097,1089098,1089107,1089112,1089116,1089117)
and it returned the correct 23 files.
Seems like a logic error and should be a quick fix.
#1 Updated by Robert Illingworth over 5 years ago
Actually this is by design. The default assumption is that if something goes wrong then then entire process is bad and should be redone. This makes sense for grid jobs where failures often mean losing all your output , but not so much for other tasks. So you can turn that off:
$ samweb -e nova count-files $(samweb -e nova project-recovery --useProcessStatus=0 --useFileStatus=0 janzirn-UMN_JM_Transfer-20140805_1334) 23
In this case you could skip the project-recovery command and create the definition by hand with something like
defname: UMNTransferJM minus (project_name janzirn-UMN_JM_Transfer-20140805_1334 and consumed_status consumed)
All the command ends up doing for you is looking up the snapshot id and putting that in the query. In fact, if you are using suitably named projects, you can wildcard it and keep re-running projects using the same definition until it returns no files:
defname: UMNTransferJM minus (project_name janzirn-UMN_JM_Transfer-% and consumed_status consumed)
#2 Updated by Jan Zirnstein over 5 years ago
Aha thanks for the quick update. That makes perfect sense, it's just that I encountered the "other tasks" first.
I'm assuming the last part is what's referred to as a draining data-set?
This issue can be closed, since it wasn't an issue in the first place, just user ignorance.