Protocol Compiler

The Protocol Compiler is a command line tool that converts a protocol specification into one of several target languages. The generated source file can be built and included in an application allowing it to marshal and unmarshal messages. The messages are implemented using the target languages' native data types.


The Fermilab Control System has dozens of protocols carried by the ACNET transport layer. These protocols are responsible for acquiring real-time data, reporting alarm conditions, reporting changes to the central device database, broadcasting Tevatron clock events, and announcing state changes, among others. The network representation of these protocols has been based upon the layout of a C-language structure generated by a specific C compiler targeted for a little-endian machine. Given the historical choices of hardware at Fermilab, this has worked well for many years.

As the control system evolved, however, this approach became error prone. Our instrumentation layer includes both big-endian and little-endian processors so applications must make sure bytes are ordered correctly. With the inclusion of Java, Erlang, and Python to our control system, we lose the ability to incorporate the raw C structures that describe the protocol and instead need to process the data piece by piece. Lastly, newer systems use 64-bit operating systems, so we need to carefully specify the sizes of integer fields if we still want to communicate with 32-bit systems.

This methodology can, clearly, be improved.

Auto-Generate the Code

What would be considered a better system? What details could we automate to make programmers more efficient?

First, we take the protocol specification away from any given programming language; defining protocol messages using the C language is deprecated. Protocols are now formally defined in a neutral, source-file format. The protocol compiler verifies the specification and then generates the target source files. Since the compiler is generating code, we make sure it not only writes code that marshals and unmarshals messages, but also does sanity checks on incoming messages to avoid buffer overruns and other problems that can cause crashes. These are details that programmers tend to ignore but which are essential for reliable communications.

Next, we make sure the messages are easy to manipulate within the target language. In C++, messages are structs and the fields are primitive C++ types. In Java, a message is a class and the data member are the fields of the message. Erlang uses records as their native representation. By using a natural representation, we avoid forcing programmers from learning an API to traverse some data structure (e.g. DOM.)


usage: pc [-V] [-I PATH] [-a] [-v] [-q] [-W] [-r] \
          [-l LANGUAGE [...language modifiers...]] file [...]

    -a           generate code for all structs -- even if not used
    -c, --client
                 only generate code for protocol clients
    -I PATH      adds a search path to find proto files
    -l LANGUAGE  generates source code for a given language
                 (supported languages are c++, java, erlang, objc,
                 ocaml, javascript, and python)
    -q           removes verbosity from the output
    -r           validate 'returns' statements
    -s, --server
                 only generate code for protocol servers
    -t PATH      output target path
    -v           adds verbosity to the output
    -V           print the version number
    -W           turns on more warnings
    file         the proto file containing the protocol description

  Language modifiers:
                 use pre-C++11 language features -- only for use with
                 older, outdated compilers. THIS IS THE DEFAULT AT

                 use C++11 language features: std::unique_ptr<> is used
                 instead of std::auto_ptr<> for optional fields, range-based
                 for-loops are used when iterating through containers, and
                 strongly typed enumerations are used for enumerations.

                 use C++11 features along with new C++14 features:

                 use C++14 features along with new C++17 features:

                 use C++14 features along with new C++17 features:
                 Optional fields use std::experimental::optional<T>.

                 only the C++ header is generated

                 only the C++ source file is generated

                 the Erlang generator will encode/decode strings in
                 the protocol as binaries

                 allows the generated Erlang code to be compiled as
                 byte-code (this option is not recommended as it
                 will greatly slow down serialization)

    --java-use-pkg PACKAGE
                 place generated Java classes in the specified package

                 generate code using the Java I/O streams interface

                 generate code using the Java ByteBuffers interface

                 creates Java overlay classes for use in Google
                 Web Toolkit (GWT) based applications

                 writes a source file containing functions common to
                 all protocols. The created file is called 'proto_lib.js'

                 Messages are unmarshalled into a reused instance of
                 the message.

                 generate objective-c code using older retain/release
                 memory model

                 requests and reply messages are specified using
                 polymorphic variant types rather than just variant

                 target Python v3.x

Option Descriptions

This section provides more detailed information on some of the options.

Addition Command-line Option Information
Command Option Description
-a The code generators only emit marshaller and unmarshaller functions for messages and structures that are being used. With this option, all messages and structures will be supported, even if they're not used. This option is mainly used for debugging the protocol compiler and isn't useful for operational code.
-c or --client Only generates code needed by clients of the protocol. This means the generated module can only marshal requests and unmarshal replies. It reduces the generated code size, since it doesn't generate code that is unused (plus, the client code can't accidentally call the wrong functions.)
-I PATH Adds a directory, PATH, to the search path. If the protocol source file isn't in the current directory, it will look through each directory in the search path to find it. This option can be specified multiple times to build-up a larger search path.
-l LANGUAGE Specifies which code generator to use. The generated code uses the most natural representation of data for the target language. There are nine target languages (itemized below) and the mappings for each are described in the corresponding Wiki page.
-r Ensures the protocol file has "return specifications" for every message. Currently this option isn't useful because the code generators don't (yet) take advantage of the information provides by return specification.
-s or --server Only generates code needed by servers using the protocol. This means the generated module can only marshal replies and unmarshal requests. It reduces the generated code size, since it doesn't generate code that is unused (plus, the server code can't accidentally call the wrong functions.)
-t PATH Specifies the directory to place the generated code. If this option isn't provided, the current directory is used.
-V Prints the version number of the protocol compiler and the version number of the network layout (SDD). The SDD version number rarely changes. Code generated for different SDD versions are guaranteed to be incompatible. For protocol compiler versions, changes to the minor number are backward compatible and changes to the major number aren't. The table below describes the revision history.

We define "compatibility" to mean a currently running system can communicate with a system built with a newer version.
Supported -l Options
Language -l Option Value Description
C++ c++ The C++ generator gives protocol support for the Linux console environment as well as MOOC-based front-ends.
Erlang erlang Erlang-based front-ends, FIRUS servers, and the Wilson Hall lighting system use this generator for their protocols.
Java java The Java generator lets the Java data acquisition engines and servlets understand and use protocols. The output can be tailored to GWT development by adding the --java-gwt option
JavaScript javascript Generates a .js file which can be used in an HTML document so a web app can communicate.
Kotlin kotlin Generates a .kt file for Android developers. This is a work-in-progress which is tracked in #20611.
Objective-C objc This generator doesn't emit general Objective C source code. Instead, it emits OS X and iOS compatible code to allow Macs, iPhones and iPads to use the protocol compiler.
OCaml ocaml This is a generator for the static and strongly-typed functional language, OCaml.
Python python The Python DPM client module uses this generator to retrieve accelerator data.
Rust rust Generates an .rs file. Although completed, we're looking for feedback from Rust programmers.

The language modifier options are defined in the language-specific Wiki page.

Version History

Protocol Compiler Version History

Version Changes
1.1 Initial release
1.2 version keyword has been removed
Support added for the Rust programming language.
Javascript generator now uses generators for the marshalers instead of a custom object
Optional fields, in the Javascript generator, now use undefined instead of null to represent unassigned data.
1.3 The protocol compiler project is now managed in npm.
Deprecate --c++-17exp option. It has been changed to --c++-exp.
1.3.1 - 1.3.3 Minor bug fixes needed to make npm build and install the protocol compiler correctly.
1.3.4 - 1.3.6 Added --js-dts option which instructs the Javascript generator to also emit a Typescript declaration file.
Minor corrections to the Python generator
1.3.7 Added --python-v3 option which produces Python 3.x compatible code.

SDD Version History

An early form of the format was called Self-Describing Data and the header of encoded messages still have the characters 'SDD'. The layout of the messages doesn't change often but, when it does, we need to make sure both participants of the protocol are using the same layout. The value after the 'SDD' header is the version of the layout.

Version Status Changes
1 Obsolete Initial, versioned release
2 Stable Integer values must be encoded in the smallest representation possible.
Floating point numbers are always encoded as 9 bytes (tag + 8 bytes)
3 Experimental This version introduces "union types". See #11454 for encoding. Right now this version is only available to developers.

Protocol Source File Grammar

The following section describes, in Extended Backus-Naur Form, the grammar used in protocol source files.

protocol ::= struct-def*, enum-def*, request-message-def+, reply-message-def*, rule-def+

enum-def ::= "enum", name, enum-list

struct-def ::= "struct", name, field-list

request-message-def ::= "request", name, field-list

reply-message-def ::= "reply", name, field-list

enum-list ::= "{", name, ("," name)*, "}" 

field-list ::= "{", field, (",", field)*, "}" 

field ::= "optional"?, type, name, "[]"?, ";" 

type ::= "bool" | "int16" | "int32" | "int64" | "double" | "string" | "binary" |
         ("struct", name)

rule ::= name, "->", reply-type-list, ";" 

reply-type ::= ("single" | "multiple"), name

reply-type-list ::= (reply-type, (",", reply-type)*), "nothing" 

name ::= [_a-zA-Z], [_a-zA-Z0-9]*