Built-in Version Device is brittle
In the past few months, quite a number of front-ends have required a reboot several times a day. In my case,
MEBIT1 have been very unreliable. Attempts to fix drivers or use older libraries failed to improve their uptime. Dennis felt there was a correlation to when "Big Saves" occurred. This was confirmed1 when Mike Kuplic discovered putting a MOOC version device on a parameter page would instantly "crash" the front-end2. I put my three front-ends' version devices into "Document" state so that Big Save wouldn't read them and all three front-ends ran all weekend without a reboot.
The built-in MOOC version device parses CVS generated strings to determine which modules are loaded and at what version they are. Many projects have been moved to Redmine, which uses
git, so the version strings probably have changed format. Looking at the source code for the version device shows no error checking, so it's easy to see how changing the input to this algorithm would break it.
NOTE: The MOOC version device has existed since v4.0 (circa 2009) and hasn't changed so all 4.x versions of MOOC will fail trying to get version information from projects managed in
NOTE: If this bug is affecting your front-end, a quick workaround is to obsolete the version device associate with it. Once MOOC is fixed, the device can be un-obsoleted.
There are several ways we could go about fixing this:
- Write a more robust parser which identifies CVS and GIT-style strings and also reports an error if it's neither (instead of corrupting memory.)
- Develop a simple framework for modules to register their version so there's no need for any parsers; the version device simply returns information registered by the modules.
- Rewrite the version driver to use loaded module information to get the version info (whatever
moduleShow()uses to get its info.) I haven't looked to see if this option is possible.
Option 3 has the benefit that, if a front-end developer uses
-latest symbolic links, we could report the version as unknown since, between reboots, the version may change. This would encourage developers to specify versions in their startup scripts. If we decide to do option 2, then the framework should be able to recognize when the front-end is running BETA code (i.e. using the
-latest symbolic link) and report it as such. I don't think we should pursue option 1.
1 No one uses the version device except as a diagnostic. Big Saves, however, always reads them and that's when the front-ends would go down.
2 The "crash" is that the
RETRPY is suspended so no
RETDAT data will ever be returned. The task gets suspended because the heap is detected as corrupted.
#2 Updated by Richard Neswold over 2 years ago
- Description updated (diff)
Mention that all versions of MOOC are susceptible to this bug, if the front-end also loads modules manage by
Mention a temporary workaround.
The version device looks for 5 modules: MOOC, ACNET, VUCD, SLD, and VWSUPPORT. If any of these projects are moved to Redmine /
git, they should restrict their builds to VxWorks 6.x since MOOC v4.8 is only targeted for 6.x; we shouldn't break front-ends still using older versions of MOOC (although, if they want to use a newer, git-based module with an older MOOC, they could obsolete their version device to work around this bug.)
#3 Updated by Richard Neswold over 2 years ago
- Target version changed from MOOC v4.8 to MOOC v5.0
The only project that didn't have the version in the proper format was v3.3 of the UCD project. I released a new v3.3 beta for that project which should prevent this problem. MOOC's version device driver still needs to be fixed, but I can push it off to the next version.