Partition notes 2 - initially started 05/25/2018¶
As a first pass, partitions can be used to satisfy all of the use cases mentioned below.
The ultimate "without having to think much about it" can be achieved by setting an environment
variable ARTDAQ_DAQINTERFACE_PARTITION in one's .bash_profile file.
"Partitions" should be 0 indexed -- the default partition should be 0.
"Subsystem" number can start with 1 and be the lower 8 bits of a multicast address.
|TCP||DaqInterface listen port||1 per "artdaq system"||10000 + p * 1000|
|UDP||Message Facility Viewer||1||10005 + p * 1000 (Optional multicast config)|
|TCP||Routing Tokens||1 per Routing Master/Domain||10010 + p * 1000 + s|
|UDP||Routing Table Acks||1 per Routing Master/Domain||10030 + p * 1000 + s|
|TCP||XMLRPC||1 per artdaq process||10100 + p * 1000 + r|
|TCP||TCP Socket Transfer||1 per process using||10500 + p * 1000 + r|
|Multicast||Request Messages||1 per Subsystem / Request Domain||227.128.(p).(s + 128) : 3001|
|multicast||Routing Tables||1 per Routing Master/Domain||227.129.(p).(s + 128) : 3001|
|multicast||Multicast Transfer||1 per process using||227.130.14.(p + 128) : (r + 1024)|
- p = partition, r = rank, s = subsystem/routing master instance. These assume rank is small (~N processes)
- Most systems will have "spare" ports in their 1000-port range (i.e. not all ports will be used in contiguous blocks)
- Assumption is that a large system is < ~300 processes, < ~10 subsystems/routing domains. If larger numbers of processes are needed, configuration overrides will be necessary.
- Rank for TCP Socket transfer is DESTINATION rank
- Rank for multicast transfer is SOURCE rank
For a system with 100 processes approx. 250 ports would be required.
A partition can be allocated 1000 ports with the default (partion 0) base being 10000.
So, 20 partition would allocate ports in the range of 10000 to 30000.
Port Conflicts - with other applications¶
We will provide at least one way to override ports. This could be for an individual port, a group of ports or all the ports in a partition; this is TBD.
Initially, if there is a conflict, one could try a different partition.
Port Conflicts - overriding to create system wide (unique) boardreader resources¶
In the demo environment, we do not need unique resources, but real systems will usually have resource where
allocation to (or use in/by) multiple partitions will be an error and should cause a conflict.
Partition notes 1 - 03/21/2018¶
1) attempts to launch on unused port (via ps) -- not full proof
2) redefine get_highest_port to return fixed number
currently daqInterface ports for different users/partitions
related - TCP socket transfer port¶
has a base port -- old default 6300
every connection -- add 10
don't separate daqinterface and rest of artdaq system
Currently (2018-03-20), daqinterface does ps to determine a port (to listen on), but does not
automated processing to determine ports for artdaq processes (include tcp socket transfer plugin).
Hopes: that this would work for a distributed env. -- ie. mu2edaq01 - 11, where people could start on different nodes,
but used any of the other nodes.
a file that has partition allocation information.
A few use cases that come to mind (KAB), some of which have already been mentioned:
- It would be great if multiple users could each run a single instance of DAQInterface+artdaq from their own account on the same computer, without having to think much about it.
- this use case is multiple instances in the same computer, each from a different user account
- It would also be nice to be able to run multiple instances of an "artdaq system" (DAQInterface+artdaq processes) from a single account on the same computer. The running of such systems may not be as auto-magical as the first use case, but that's OK. If someone, or a group of people, are running multiple partitions from a single account, it seems reasonable that they know a little about the details of the environments for those partitions.
- Same as previous, but spread across a cluster of computers.
- Is it worth considering a scenario in which a single user would like to be gently prevented from running multiple partitions? We can imagine a new user starting an artdaq system multiple times without remembering to shut down the system each time. In this case, it is probably the right choice to tell the user "you've already got a system running" when he/she tries it the second, third, etc. time instead of blithely creating a new partition each time.
Whatever we come up with needs to work on a variety of systems:
- mu2e Pilot cluster, protoDUNE NP04 cluster
- NOvA and DUNE GPVM nodes
- CERN lxplus
In some of these cases, we may not have the ability to manage things cluster-wide (e.g. GPVM nodes or lxplus).
Maybe whatever local 'file' that we come up with indicates whether the port managing is local to one computer or cluster wide. (After thinking about it for a few minutes, it might make sense for that sort of user-account-local file to be named something like ~/.artdaq.)