Project

General

Profile

Feature #12037

Decouple the jobsub server from the condor schedd

Added by Kevin Retzke about 3 years ago. Updated about 1 month ago.

Status:
Assigned
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
03/23/2016
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

As we discovered yesterday, shutting down a jobsub server, but leaving the associated condor schedd running, causes problems,
since apparently jobsub expects there to be a server for each schedd. I.e. even if a jobsub_submit request goes to fifebatch2, if
the jobusb server decides the job should go to fifebatch1 it will go through the fifebatch1 jobsub server rather than through the fifebatch1 schedd.

This would support future scaling out, HA, and potentially even group-specific schedds.

History

#1 Updated by Dennis Box over 2 years ago

  • Status changed from New to Assigned
  • Assignee set to Dennis Box
  • Target version set to v1.2.4

This issue is also RITM0423108, I am going to close that request with a pointer to here.

I have worked out most of the details of how this could be done, with the exception of DAGS. It appears the server side implementation of jobsub_submit_dag will need some re-writing along the last few lines of this excerpt from the condor_submit_dag man page:

-r schedd_name
Submit condor_dagman to a remote machine, specifically the condor_schedd daemon on that machine. The condor_dagman job will not run on the local condor_schedd (the submit machine), but on the specified one. This is implemented using the -remote option to condor_submit. Note that this option does not currently specify input files for condor_dagman, nor the individual nodes to be taken along! It is assumed that any necessary files will be present on the remote computer, possibly via a shared file system between the local computer and the remote computer. It is also necessary that the user has appropriate permissions to submit a job to the remote machine; the permissions are the same as those required to use condor_submit's -remote option. If other options are desired, including transfer of other input files, consider using the -no_submit option, modifying the resulting submit file for specific needs, and then using condor_submit on that.

#2 Updated by Dennis Box about 2 years ago

  • Target version changed from v1.2.4 to v1.2.5

#3 Updated by Dennis Box over 1 year ago

  • Target version changed from v1.2.5 to v1.2.8

#4 Updated by Dennis Box 9 months ago

  • Target version changed from v1.2.8 to v1.3

#5 Updated by Dennis Box about 1 month ago

  • Target version changed from v1.3 to v1.3.3


Also available in: Atom PDF