Bug #19486

Updated by Richard Neswold over 2 years ago

In the past few months, quite a number of front-ends have required a reboot several times a day. In my case, @MSEPTA@, @LEBIT1@, and @MEBIT1@ have been very unreliable. Attempts to fix drivers or use older libraries failed to improve their uptime. *Dennis* felt there was a correlation to when "Big Saves" occurred. This was confirmed[1] when *Mike Kuplic* discovered putting a MOOC version device on a parameter page would instantly "crash" the front-end[2]. I put my three front-ends' version devices into "Document" state so that Big Save wouldn't read them and all three front-ends ran all weekend without a reboot.

The built-in MOOC version device parses CVS generated strings to determine which modules are loaded and at what version they are. Many projects have been moved to Redmine, which uses *@git@*, so the version strings probably have changed format. Looking at the source code for the version device shows no error checking, so it's easy to see how changing the input to this algorithm would break it.

*NOTE:* The MOOC version device has existed since v4.0 (circa 2009) and hasn't changed so all 4.x versions of MOOC will fail trying to get version information from projects managed in *@git@*.

*NOTE:* If this bug is affecting your front-end, a quick workaround is to obsolete the version device associate with it. Once MOOC is fixed, the device can be un-obsoleted.

There are several ways we could go about fixing this:

# Write a more robust parser which identifies CVS and GIT-style strings and also reports an error if it's neither (instead of corrupting memory.)
# Develop a simple framework for modules to register their version so there's no need for any parsers; the version device simply returns information registered by the modules.
# Rewrite the version driver to use loaded module information to get the version info (whatever @moduleShow()@ uses to get its info.) I haven't looked to see if this option is possible.

Option 3 has the benefit that, if a front-end developer uses @-latest@ symbolic links, we could report the version as unknown since, between reboots, the version may change. This would encourage developers to specify versions in their startup scripts. If we decide to do option 2, then the framework should be able to recognize when the front-end is running BETA code (i.e. using the @-latest@ symbolic link) and report it as such. I don't think we should pursue option 1.


fn1. No one uses the version device except as a diagnostic. Big Saves, however, always reads them and that's when the front-ends would go down.

fn2. The "crash" is that the @RETRPY@ is suspended so no @RETDAT@ data will ever be returned. The task gets suspended because the heap is detected as corrupted.