Project

General

Profile

Partition notes

Partition notes 2 - initially started 05/25/2018

As a first pass, partitions can be used to satisfy all of the use cases mentioned below.
The ultimate "without having to think much about it" can be achieved by setting an environment
variable ARTDAQ_DAQINTERFACE_PARTITION in one's .bash_profile file.

"Partitions" should be 0 indexed -- the default partition should be 0.
"Subsystem" number can start with 1 and be the lower 8 bits of a multicast address.

Port classes:
Type Description Multiplicity Algorithm
TCP DaqInterface listen port 1 per "artdaq system" 10000 + p * 1000
UDP Message Facility Viewer 1 10005 + p * 1000 (Optional multicast config)
TCP Routing Tokens 1 per Routing Master/Domain 10010 + p * 1000 + s
UDP Routing Table Acks 1 per Routing Master/Domain 10030 + p * 1000 + s
TCP XMLRPC 1 per artdaq process 10100 + p * 1000 + r
TCP TCP Socket Transfer 1 per process using 10500 + p * 1000 + r
Multicast Request Messages 1 per Subsystem / Request Domain 227.128.(p).(s + 128) : 3001
multicast Routing Tables 1 per Routing Master/Domain 227.129.(p).(s + 128) : 3001
multicast Multicast Transfer 1 per process using 227.130.14.(p + 128) : (r + 1024)
  • p = partition, r = rank, s = subsystem/routing master instance. These assume rank is small (~N processes)
  • Most systems will have "spare" ports in their 1000-port range (i.e. not all ports will be used in contiguous blocks)
  • Assumption is that a large system is < ~300 processes, < ~10 subsystems/routing domains. If larger numbers of processes are needed, configuration overrides will be necessary.
  • Rank for TCP Socket transfer is DESTINATION rank
  • Rank for multicast transfer is SOURCE rank

For a system with 100 processes approx. 250 ports would be required.

A partition can be allocated 1000 ports with the default (partion 0) base being 10000.
So, 20 partition would allocate ports in the range of 10000 to 30000.

Port Conflicts - with other applications

We will provide at least one way to override ports. This could be for an individual port, a group of ports or all the ports in a partition; this is TBD.
Initially, if there is a conflict, one could try a different partition.

Port Conflicts - overriding to create system wide (unique) boardreader resources

In the demo environment, we do not need unique resources, but real systems will usually have resource where
allocation to (or use in/by) multiple partitions will be an error and should cause a conflict.

Partition notes 1 - 03/21/2018

Currently:

launch daqInterface
1) attempts to launch on unused port (via ps) -- not full proof
2) redefine get_highest_port to return fixed number

currently daqInterface ports for different users/partitions

related - TCP socket transfer port

has a base port -- old default 6300
every connection -- add 10

tcp_base_port

don't separate daqinterface and rest of artdaq system

Currently (2018-03-20), daqinterface does ps to determine a port (to listen on), but does not
automated processing to determine ports for artdaq processes (include tcp socket transfer plugin).

Suggest:
Hopes: that this would work for a distributed env. -- ie. mu2edaq01 - 11, where people could start on different nodes,
but used any of the other nodes.

a file that has partition allocation information.


A few use cases that come to mind (KAB), some of which have already been mentioned:

  1. It would be great if multiple users could each run a single instance of DAQInterface+artdaq from their own account on the same computer, without having to think much about it.
    • this use case is multiple instances in the same computer, each from a different user account
  2. It would also be nice to be able to run multiple instances of an "artdaq system" (DAQInterface+artdaq processes) from a single account on the same computer. The running of such systems may not be as auto-magical as the first use case, but that's OK. If someone, or a group of people, are running multiple partitions from a single account, it seems reasonable that they know a little about the details of the environments for those partitions.
  3. Same as previous, but spread across a cluster of computers.
  4. Is it worth considering a scenario in which a single user would like to be gently prevented from running multiple partitions? We can imagine a new user starting an artdaq system multiple times without remembering to shut down the system each time. In this case, it is probably the right choice to tell the user "you've already got a system running" when he/she tries it the second, third, etc. time instead of blithely creating a new partition each time.

Whatever we come up with needs to work on a variety of systems:

  • mu2e Pilot cluster, protoDUNE NP04 cluster
  • woof
  • NOvA and DUNE GPVM nodes
  • CERN lxplus

In some of these cases, we may not have the ability to manage things cluster-wide (e.g. GPVM nodes or lxplus).
Maybe whatever local 'file' that we come up with indicates whether the port managing is local to one computer or cluster wide. (After thinking about it for a few minutes, it might make sense for that sort of user-account-local file to be named something like ~/.artdaq.)