Feature #11454

Add union types

Added by Richard Neswold over 4 years ago. Updated 26 days ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:



Sometimes it makes sense that a field in a message ought to contain one of several possible types. For instance, DPM has eight reply messages which are all essentially the same except the data field is of different types. We could have created one reply message with eight optional fields, but we like the type-safety that the current approach provides. What we'd really like to do, however, is something like this (syntax may change):

enum DataType {

reply data {
    int64 ref_id;
    int64 timestamp;
    int64 cycle;
    DataType data;

In this example, there is only one message that describes a data reply. The data field is of the DataType enumeration and it can only hold one of the five types of data. Each enumerated value is associated with a value of a specified type. So, for instance, if the message had a RawVal value, it would also contain a binary value.

This can be emulated (poorly) with the current protocol compiler using optional fields for each case and only populating one at any given time. This, however, is error prone and code would have to handle the case when more than one field is present since it is a possible input. optional fields are analogous to checkbox widget in a user interface. What this issue is proposing is the analog of radio buttons. Our enumeration type gets us close to this, but falls short. What we would like is for each enumeration case to be associated with zero or more values.

Here's a union type, representing ACNET node addresses) where each value is associated with a different number of values. This example is used later in this document when showing how it is represented in the target languages:

enum TargetNode {
    TrunkNode(int16, int16)

Here Multicast means the target is the ACNET multicast address. The Name instance contains the alphabetical name of the target. Lastly, TrunkNode holds the trunk and node addresses.


Network Layout

Currently enumerations are encoded as integers with an '8' tag. The value of the enumeration is the hashed value of the enumeration value name appended to the type name. For instance, if an enumeration had the hashed value 0x123456, it gets encoded as:

83 12 34 56

The new layout would treat enumerations as a heterogenous array with the tag '8' instead of '5'. The lower half of the tag byte indicates how many items are associated with the enumeration. Since the max is 15 items, but the enumeration value counts as one of the elements, 14 items are the most that can be used. If someone needs more, one of the items could specify a structure. The tag byte will never be less than 0x81 because there will always be an enumeration value. Using the TargetNode example above and assuming "Multicast", "Name", and "TrunkNode" hash to 0x1234, 0x5678, and 0x9abc, respectively, then Multicast would encode as

81 12 12 34

Name("CLX1") would encode as

82 12 56 78 31 04 'C' 'L' 'X' '1'

and TrunkNode(14, 73) would encode as

83 12 9a bc 11 0e 11 49


How would this map to target languages? For Java, C++, and Objective-C, an enum of this type would probably become a simple class hierarchy. Javascript and Python might be class hierarchies, too, or we may find an alternate, better mapping.


If all enumerated values lack an argument, then the C++ generator can define the enumeration using the standard enum keyword. If any value takes an argument, then the entire enumeration needs to use an alternate representation. It was originally thought we could use a union to map the various enumerations on top of each other. A type code field would indicate which data was present. But this is horribly error prone. Instead, we'll create a class that provides union type behavior through its API.

The idea is that the class has enough private fields to handle any of the enumerations. A tag field indicates which enumeration is represented. This class is const; once defined, it can't be changed. static factory methods will be defined to create any of the enumerations.

Given the above enumeration:

class TargetNode_e {
    enum class Tag { Multicast, Name, TrunkNode };

    Tag const tag;
    std::string fld1;
    int16_t fld2;
    int16_t fld3;

    explicit TargetNode_e(Tag t) : tag(t) {}

    Tag getTag() const { return tag; }

    std::string getName_0() const { return tag == Name ? fld1 : throw TypeError(); }
    int16_t getTrunkNode_0() const { return tag == TrunkNode ? fld2 : throw TypeError(); }
    int16_t getTrunkNode_1() const { return tag == TrunkNode ? fld3 : throw TypeError(); }

    static TargetNode_e mk_Multicast();
    static TargetNode_e mk_Name(std::string const&);
    static TargetNode_e mk_TrunkNode(int16_t, int16_t);

To build a value:

    TargetNode_e temp = TargetNode_e::mk_Multicast();

    TargetNode_e temp = TargetNode_e::mk_Name("CLXSRV");

    TargetNode_e temp = TargetNode_e::mk_TrunkNode(13, 10);

To get a value:

    try {
        std::string name = fld.getName_0();

        // ...
    catch (TypeError const&e) {

C++17 introduces the std::variant<> templated class in the standard library, which provides this support. If the protocol compiler is asked to target C++17, or later, we can use this.


Right now, the Erlang generator defines enums as a type that is simply a list of atoms. The generator would, for enums with values, define a tuple where the first item is the atom and the remaining items are the parameters. So the above enumeration would become:

-type target_node_enum() :: 'Multicast'
                          | {'Name', string()}
                          | {'TrunkNode', -32768..32767, -32768..32767}.


The Java mapping will be similar to C++'s.


If all enumerated values are argument-less, the mapping will use the TypeScript enum definition where each value is initialized to the hashed value of the symbol's name. If any enumeration takes an argument, then TypeScript's "unioned enumeration" is used:

enum TargetNodeKind {

interface TargetNodeMulticast {
    kind: TargetNodeKind.Multicast;

interface TargetNodeName {
    kind: TargetNodeKind.Name;
    arg1: string;

interface TargetNodeTrunkNode {
    kind: TargetNodeKind.TrunkNode;
    arg1: number;
    arg2: number;

type TargetNode = TargetNodeMulticast | TargetNodeName | TargetNodeTrunkNode;


Like the Erlang mapping, the Python implementation of union types would use tuples. If the enumeration has no associated data, it's just an integer value (a symbol will be defined so no actual hard-coded values will be in the code.) Enumerations associated with n parameters will become an (n+1) tuple where the first element is the enumeration value and the rest are the data.


To be written...


This is an easy mapping (this feature was inspired by OCaml and Haskell's, so it's no wonder this is the easiest to express):

type targetnode = Multicast
                | Name of string
                | TrunkNode of int * int


Rust's enumerations are similar to OCaml and Haskell's algebraic types, so this mapping is easy:

enum TargetNode {
    TrunkNode(u16, u16)


#1 Updated by Richard Neswold over 4 years ago

  • Description updated (diff)

Fix the Erlang section. I said enums with values "would be pairs", but they should be tuples since they could have more than one parameter.

#2 Updated by Richard Neswold about 4 years ago

  • Description updated (diff)

Add a possible C++ mapping.

#3 Updated by Richard Neswold about 4 years ago

  • Description updated (diff)

Add Python mapping. Add more content to C++ mapping.

#4 Updated by Richard Neswold about 4 years ago

  • Description updated (diff)

Add color-syntax highlighting for the C++ code.

#5 Updated by Richard Neswold about 4 years ago

  • Description updated (diff)

Add final comments to C++ mapping.

#6 Updated by Richard Neswold about 4 years ago

  • Description updated (diff)

Fix some typos.

#7 Updated by Richard Neswold almost 4 years ago

  • Description updated (diff)

Hopefully improved the wording.

#8 Updated by Richard Neswold 12 months ago

  • Description updated (diff)

Explain the network format better by giving examples.

#9 Updated by Richard Neswold 12 months ago

  • Description updated (diff)

Mention that, in the C++ mapping, an enumeration where all values are argumentless can still be represented by a C++ enum.

#10 Updated by Richard Neswold 12 months ago

  • Description updated (diff)

Add placeholders for other language mappings. Hopefully others can write up mappings for these languages (i.e. Java and Objective-C.)

#11 Updated by Richard Neswold 12 months ago

  • Description updated (diff)

Propose a JavaScript implementation using TypeScript's discriminated unions.

#12 Updated by Richard Neswold 12 months ago

  • Description updated (diff)

Mention how enumeration values are hashed.

#13 Updated by Richard Neswold 12 months ago

  • Description updated (diff)

Mention C++17 defines the std::variant<> templated class1 which is, essentially, union types.


#14 Updated by Richard Neswold 11 months ago

  • Description updated (diff)

Add Rust mapping.

#15 Updated by Richard Neswold 11 months ago

  • Description updated (diff)

Add links to language features.

#16 Updated by Richard Neswold about 1 month ago

  • Description updated (diff)

Document new, simpler mapping for C++ and Java.

#17 Updated by Richard Neswold 26 days ago

  • Description updated (diff)

Mention max limit of items in a union-type enumeration using our encoding.

Also available in: Atom PDF