Project

General

Profile

Feature #11454

Add union types

Added by Richard Neswold almost 4 years ago. Updated 5 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
01/20/2016
Due date:
% Done:

0%

Estimated time:
Duration:

Description

Synopsis

Sometimes it makes sense that a field in a message ought to contain one of several possible types. For instance, DPM has eight reply messages which are all essentially the same except the data field is of different types. We could have created one reply message with eight optional fields, but we like the type-safety that the current approach provides. What we'd really like to do, however, is something like this (syntax may change):

enum DataType {
    RawVal(binary),
    ScalarVal(float),
    TextVal(string),
    ScalarArray(float[]),
    TextArray(string[])
}

reply data {
    int64 ref_id;
    int64 timestamp;
    int64 cycle;
    DataType data;
}

In this example, there is only one message that describes a data reply. The data field is of the DataType enumeration and it can only hold one of the five types of data. Each enumerated value is associated with a value of a specified type. So, for instance, if the message had a RawVal value, it would also contain a binary value.

This can be emulated (poorly) with the current protocol compiler using optional fields for each case and only populating one at any given time. This, however, is error prone and code would have to handle the case when more than one field is present since it is a possible input. optional fields are analogous to checkbox widget in a user interface. What this issue is proposing is the analog of radio buttons. Our enumeration type gets us close to this, but falls short. What we would like is for each enumeration case to be associated with zero or more values.

Here's a union type, representing ACNET node addresses) where each value is associated with a different number of values. This example is used later in this document when showing how it is represented in the target languages:

enum TargetNode {
    Multicast,
    Name(string),
    TrunkNode(int16, int16)
}

Here Multicast means the target is the ACNET multicast address. The Name instance contains the alphabetical name of the target. Lastly, TrunkNode holds the trunk and node addresses.

Implementation

Network Layout

Currently enumerations are encoded as integers with an '8' tag. The value of the enumeration is the hashed value of the enumeration value name appended to the type name. For instance, if an enumeration had the hashed value 0x123456, it gets encoded as:

83 12 34 56

The new layout would treat enumerations as a heterogenous array with the tag '8' instead of '5'. The tag byte will never be less than 0x81 because there will always be an enumeration value. Using the TargetNode example above and assuming "Multicast", "Name", and "TrunkNode" hash to 0x1234, 0x5678, and 0x9abc, respectively, then Multicast would encode as

81 12 12 34

Name("CLX1") would encode as

82 12 56 78 31 04 'C' 'L' 'X' '1'

and TrunkNode(14, 73) would encode as

83 12 9a bc 11 0e 11 49

Mapping

How would this map to target languages? For Java, C++, and Objective-C, an enum of this type would probably become a simple class hierarchy. Javascript and Python might be class hierarchies, too, or we may find an alternate, better mapping.

C++

If all enumerated values lack an argument, then the C++ generator can define the enumeration using the standard enum keyword. If any value takes an argument, then the entire enumeration needs to use an alternate representation. It was originally thought we could use a union to map the various enumerations on top of each other. A type code field would indicate which data was present. But this is horribly error prone. So instead, we'll create a class hierarchy:

struct TargetNode_Base_e {
    virtual ~TargetNode_e() {}
    virtual TargetNode_Base_e* dup() const = 0;
};

struct TargetNode_Multicast_e : TargetNode_Base_e {
    virtual TargetNode_Base_e* dup() { return new TargetNode_Multicast_e; };
};

struct TargetNode_Name_e : TargetNode_Base_e {
    std::string param1;

    TargetNode_Name_e(TargetNode_Name_e(const& o) :
        param1(o.param1) {}

    virtual TargetNode_Base_e* dup() { return new TargetNode_Name_e(*this); };
};

struct TargetNode_TrunkNode_e : TargetNode_Base_e {
    int16_t param1;
    int16_t param2;

    TargetNode_TrunkNode_e(TargetNode_TrunkNode_e const& o) :
        param1(o.param1), param2(o.param2) {}

    virtual TargetNode_Base_e* dup() { return new TargetNode_TrunkNode_e(*this); };
};

class TargetNode_e {
    std::auto_ptr<TargetNode_Base_e*> ptr;

    TargetNode();

 public:
    ~TargetNode_e() { }

    static TargetNode_e make_Multicast();
    static TargetNode_e make_Name(std::string const&);
    static TargetNode_e make_TrunkNode(int16_t, int16_t);

    template <class T>
    T* get() const { return dynamic_cast<T*>(ptr); }
};

Of course, the above isn't complete and needs to be fleshed out so values are copied properly, memory is freed properly, etc.

To build a value:

    TargetNode_e temp = TargetNode_e::make_Multicast();

    TargetNode_e temp = TargetNode_e::make_Name("CLXSRV");

    TargetNode_e temp = TargetNode_e::make_TrunkNode(13, 10);

To get a value:

    TargetNode_e fld = msg.enum_field;

    if ((TargetNode_Name_e* const temp = fld.get<TargetNode_Name_e*>()) != 0) {
    }

There may be a way to simplify this mapping. Using namespaces or nested classes may shorten the names. Maybe someone has a completely different way to accomplish this in C++. Whatever the solution, it must 100% type-safe, RAII compliant, and exception-safe.

C++17 introduces the std::variant<> templated class in the standard library, which provides this support. If the protocol compiler is asked to target C++17, or later, we can use this.

Erlang

Right now, the Erlang generator defines enums as a type that is simply a list of atoms. The generator would, for enums with values, define a tuple where the first item is the atom and the remaining items are the parameters. So the above enumeration would become:

-type target_node_enum() :: 'Multicast'
                          | {'Name', string()}
                          | {'TrunkNode', -32768..32767, -32768..32767}.

Java

To be written...

JavaScript

If all enumerated values are argument-less, the mapping will use the TypeScript enum definition where each value is initialized to the hashed value of the symbol's name. If any enumeration takes an argument, then TypeScript's "unioned enumeration" is used:

enum TargetNodeKind {
    Multicast,
    Name,
    TrunkNode
}

interface TargetNodeMulticast {
    kind: TargetNodeKind.Multicast;
}

interface TargetNodeName {
    kind: TargetNodeKind.Name;
    arg1: string;
}

interface TargetNodeTrunkNode {
    kind: TargetNodeKind.TrunkNode;
    arg1: number;
    arg2: number;
}

type TargetNode = TargetNodeMulticast | TargetNodeName | TargetNodeTrunkNode;

Python

Like the Erlang mapping, the Python implementation of union types would use tuples. If the enumeration has no associated data, it's just an integer value (a symbol will be defined so no actual hard-coded values will be in the code.) Enumerations associated with n parameters will become an (n+1) tuple where the first element is the enumeration value and the rest are the data.

Objective-C

To be written...

OCaml

This is an easy mapping (this feature was inspired by OCaml and Haskell's, so it's no wonder this is the easiest to express):

type targetnode = Multicast
                | Name of string
                | TrunkNode of int * int

Rust

Rust's enumerations are similar to OCaml and Haskell's algebraic types, so this mapping is easy:

enum TargetNode {
    Multicast,
    Name(String),
    TrunkNode(u16, u16)
}

History

#1 Updated by Richard Neswold almost 4 years ago

  • Description updated (diff)

Fix the Erlang section. I said enums with values "would be pairs", but they should be tuples since they could have more than one parameter.

#2 Updated by Richard Neswold over 3 years ago

  • Description updated (diff)

Add a possible C++ mapping.

#3 Updated by Richard Neswold over 3 years ago

  • Description updated (diff)

Add Python mapping. Add more content to C++ mapping.

#4 Updated by Richard Neswold over 3 years ago

  • Description updated (diff)

Add color-syntax highlighting for the C++ code.

#5 Updated by Richard Neswold over 3 years ago

  • Description updated (diff)

Add final comments to C++ mapping.

#6 Updated by Richard Neswold over 3 years ago

  • Description updated (diff)

Fix some typos.

#7 Updated by Richard Neswold over 3 years ago

  • Description updated (diff)

Hopefully improved the wording.

#8 Updated by Richard Neswold 6 months ago

  • Description updated (diff)

Explain the network format better by giving examples.

#9 Updated by Richard Neswold 6 months ago

  • Description updated (diff)

Mention that, in the C++ mapping, an enumeration where all values are argumentless can still be represented by a C++ enum.

#10 Updated by Richard Neswold 6 months ago

  • Description updated (diff)

Add placeholders for other language mappings. Hopefully others can write up mappings for these languages (i.e. Java and Objective-C.)

#11 Updated by Richard Neswold 6 months ago

  • Description updated (diff)

Propose a JavaScript implementation using TypeScript's discriminated unions.

#12 Updated by Richard Neswold 6 months ago

  • Description updated (diff)

Mention how enumeration values are hashed.

#13 Updated by Richard Neswold 6 months ago

  • Description updated (diff)

Mention C++17 defines the std::variant<> templated class1 which is, essentially, union types.


1 https://en.cppreference.com/w/cpp/utility/variant

#14 Updated by Richard Neswold 5 months ago

  • Description updated (diff)

Add Rust mapping.

#15 Updated by Richard Neswold 5 months ago

  • Description updated (diff)

Add links to language features.



Also available in: Atom PDF