Project

General

Profile

Bug #21672

ps_lxi_driver downloads problem at startup, alarms fail

Added by Dennis Nicklaus 8 months ago. Updated 2 days ago.

Status:
Feedback
Priority:
High
Category:
Sorenson XG/SG Power Supply Driver
Target version:
Start date:
01/11/2019
Due date:
% Done:

0%

Estimated time:
Duration:

Description

When starting clx30e, ,there appear to be startup order/race conditions.
I iupgraded it to erlang 21.1 and daq 1.7. Daq 1.7 now prints a message to the log when an alarm scan dies.
At startup, a lot (all?) the alarm scans die, apparently. The 'DOWN' message shows that they exit with status=normal.
That would generally happen when the driver returns ERR_DEVFAILED for a reading.

I attempted to make things work by using acl to run "download node=clx30e" and re-start all the alarms. When I did that, I noticed a lot of
setting messages "Set current to ..." printed out that otherwise didn't get printed at startup.

So it appears that the driver isn't really ready for downloads when the downloads happen.

I don't think going to 21.1 or daq 1.7 had anything to do with causing this failure. Just with the new version, printing the 'DOWN' messages when the alarm scans die is new and that makes the problem apparent.

History

#1 Updated by Dennis Nicklaus 8 months ago

Just to clarify -- generally, it is the framework that would return the DEVFAILED error when it cannot find a driver process, not the individual driver code itself that would return that error.

#2 Updated by Richard Neswold 8 months ago

  • Category changed from ACSys/FE Framework to Sorenson XG/SG Power Supply Driver

The driver doesn't accept settings until it's communicating with the power supply. If we don't have communications with the power supply when a setting comes in, the driver needs to remember and apply it when we regain communications.

#3 Updated by Richard Neswold 8 months ago

Here's a proposed fix:

Delete the setting record property in the database for each device so that settings are not sent down when the front-end is restarted.

  • For front-end restarts, it connects to each power supply and reads the current setting.
  • When the front-end and power supplies are off for a long time (a shutdown, for example) and then turned back on, we won't try to use a setting that's weeks (or months) old. The supply will come up off and the operators will decide what the setting should be.

Chip: comments?

#4 Updated by Richard Neswold 8 months ago

We should retry this with the latest version of the framework. Dennis and I found code that was still using sync:to_ms/1. We fixed it here: b35acc53.

#5 Updated by Richard Neswold 8 months ago

  • Status changed from New to Feedback

Two fixes:

commit 14f337f8 -- In this change, after the front-end software is up and running, we start a background task which sleeps for 5 seconds before asking for settings and alarms to be downloaded. This gives drivers a little time to initialize themselves before the settings start appearing.

commit dev-ps_lxi|e6016c96 -- In this change, the driver doesn't wait 15 seconds before attempting communications, it tries right away (if it fails, it uses the 15 second delay before trying again.) This, with the previous change, removes the race condition.

If this works, I'll close out the issue.

#6 Updated by Richard Neswold 2 days ago

  • Target version set to dev-ps_lsi v1.0

Did these changes help the situation? Are we still seeing boot problems?



Also available in: Atom PDF