Allow extra stage(s) in DAG for --dataset_definition
Currently the --dataset_definition flag specifies that we run a DAG with
a startproject job, then N workers, and an end project job.
I would like an option to insert another stage in the DAG which
would run a specified script before the endproject job.
This could, for example, use htadd to sum up histogram files
generated by the workers, etc.
Something like --extra_dag_stage=scriptname
which would run that script. I'm currently working on a standalone
script to run DAGs like this, but it would be better to build it
This can certainly wait for the client/server version.
#4 Updated by Dennis Box almost 5 years ago
Hi Marc, Parag,
I see this request is 9 months old, so things may have changed since the original request. It appears to me to be a request to run a user defined POST script on the server. I could see this being a potential security issue.
Marc, I have 2 questions:
1) do you still want this?
2) Would a user defined DAG job that runs on a worker nodes post the users SAM jobs be a better solution?