Older, Resolved Issues

  1. Shutter Controller Error (Hardware) (1/3/2014)
    Unfortunately we are experiencing some problems with the shutter hardware at a rate of about once per night. When the shutter controller detects a problem with the shutter mechanical elements it sets an error bit and stops the current operation. SISPI detects this error bit, breaks the interlock, stops the exposure queue and issues an alarm with instructions to reset the Bonn Shutter controller with the on the telescope operator console. Once the shutter controller has been reset, you need to reset the SISPI interlock by clicking on RESET in the system control section of the Observer Console GUI. If successful restart the exposure loop. If RESET fails try CONFIGURE and if that fails as well you have to restart SISPI.
  2. Architect Startup, especially with the Starfinder (8/26/2013)
    Occasionally, the Architect will fail to start a role specified in the ini file. The role STARFINDER seems to have this problem the most often. This appears to be a problem with eups; if a role requires special setup and that setup fails, the role will not start. The architect has been updated to notice this error condition and try again; since the eups problems appear to be a race condition, this often solves the problem. The Architect has also been updated to log more information about this situation (and any PML command sent after the instance is up and running) for better debuging. Note that the Architect waits a few moments before retrying the eups setup; at OSU, I (Ann) have seen the starfinder take long enough to get started that I was able to log in to the Console, get a list of roles from the Architect, see that the starfinder was missing, all before the starfinder successfully started without my intervention. So if a role has not started, look at the Architect logs to see if it's still trying.
  3. Architect Option Change (2/28/2013)
    The optional flags to run the cleanup scripts, and to speed up startup by not waiting on start_role calls have now been made the default. These features can now be turned off (instead of turned on) with command line flags (-k is now keep_processes instead of kill, and the old -n for no_wait became -r for return_after_start_role (in contrast to the default, which is to return immediately)). The cleanup performed by the architect now includes ramdisk cleanup as well as process cleanup.
  4. Restarting the GUIDERGUI (2/8/2013)
    The GUIDERGUI is different from all the other SISPI GUIs and requires special care if you need to restart it. The GUIDERGUI is not a web-based application. Instead it runs on readout2 in the vnc session using display 15. Typically a vncviewer running on observer1 is connected to this server and shows the Guider GUI. This particular configuration prevents the Architect from stopping the GUIDERGUI application when SISPI shuts down. To cleanup you need to use the cleanup_processes -k command or you need to use the -k option when you run the architect. Failure to do so will leave multiple copies of the GUIDERGUI running on readout2 which typically leads to bad observer confusion.
    If you only want to restart the GUIDERGUI you can use the Architect Console but it's a bit more complicated then just restarting other SISPI applications. Note that this procedure will break the interlock and that you will have to run configure to restore the system. In other words don't do this in the middle of an observing sequence.
    1. On the Architect Console select the GUIDERGUI role and click stop
    2. You should see some messages about this and some child processes still running in the (terminal) windows where you are running the architect.
    3. In the readout2 vnc session on observer1 (or if you have to start a new viewer and connect to the vnc server) close the Guider GUI window and close the terminal window
    4. On the Architect Console select the GUIDERGUI role and click start
    5. When the GUI is back (readout2 vnc session) you need to run configure to reset the interlocks.
    6. When the Guider configures it will connect to the GUIDER GUI - no need to push the Connect Guider button.
  5. Frozen OCS (11/11/2012)
    Occasionally we observe that the OCS is not doing anything even though there are image requests in the queue and no interlocks are broken. You also notice that when you try to stop the queue (Hit the GO/STOP button) the queue will not be disabled (and the Stop label changes to Go). In this case it's likely that one of the exposure loop threads is holding on to a lock (a python process synchronization tool). We are working on preventing this from happening but it still happens. In this situation, use the Architect Console and select the OCS.
    In the command field enter get runstate
    If this returns RUNNING the OCS is indeed in the state just described. You can try to recover using the clearLocks command. In the Architect Console select the OCS and enter the command clearLocks.
    Note The OCS can also look stalled when you are waiting for the TCS to complete the slew. So before you do anything you might want to check the ICS/TCS gui (is the Stop Slew button displayed?) and/or ask the telescope operator
    Note 2 on 11/12/2012 This occurred tonight on exposure 150599 when the shutter errored. There were errors in the shutter logs but no alarms and no interlocks were broken. The runstate was "RUNNING" so then we did a clearLocks, which stopped the queue. We were then able to press Go without a reconfigure or a reset, and the rest of the queue completed successfully. Perhaps this means that the shutter should break its interlock when this error happens.
  6. Baseline F&A Configuration for SV (10/31/2012)
    The default configuration file (sv.ini) for science verification configures the system to use the telescope and filter look up tables, to accumulate tweaks from the AOS system and to initialize the trim z coordinate to z = 1400.0 - 110.0 * temperature where temperature is the average upper truss temperature. The temperature look up table is not used. You should not change any of these settings unless you really know what you are doing. The only thing for you to do is to push the Init Z trim button before you start to observe (once the dome has been opened and the telescope is at ambient temperature). This will clear any tweak value and set the z trim value for the current temperature. Note that the temperature is obtained from the TCS so the TCSInterface has to be running and configured at this time.
  7. Master Console is Activated (11/1/2012)
    The master console feature is activated in DES.ini. This means that you have to login as DECamObserver ( propid password) on one of the GUIs on observer1.
    It has to be the observer1 computer and it has to be DECamObserver account, even if you have your own SISPI account. You can use your personal GUI account but you will only have user privileges and cannot change this level unless approved by the observer sitting at the master console. The level of the master console session will be automatically raised to Expert. If you want to authorize some other session (e.g. a remote expert) to raise its level you need to open the control field by clicking on the button with the person icon in the toolbar. This displays a list of active sessions and pending requests. Click on " " to approve a request.
  8. Architect Clean-Up (10/31/2012)
    It has been observed that the architect leaves some processes behind when the instance is shutdown. We are working on a fix. Until then it is recommended that in the afternoon, when you prepare the system for the night, you log in to ics1 as sispi and check for old shutter processes. For example ps aux | grep bin/Shutter If the instance is running there should be only one such process (none if the instances is down). If you find a shutter process when the instance is down stop it with kill -9 <pid>
    This is now addressed by the cleanup_processes script and the -k option of the architect.
  9. PML error: Server too busy (11/18/2012)
    This is the first time we have ever seen this error. The Hexapod was refusing PML connections with an error about "Server too busy". This is a Pyro thing. If it comes up again, we can increase Pyro's maximum number of allowed connections by setting PYRO_MAXCONNECTIONS. We can do this in the same manner we set other Pyro setting (Pyro reads settings from a file or from environment variables). After joining an instance, you can check out Pyro's current configuration with "python -m Pyro.configuration". Right now, we have max set to 200 connections.
  10. Shutter stuck open in GUIs (11/11/2012)
    We have identified a problem with the shutter code that prevents the shutter displays in the GUIs from closing. We verified using dome flats and comparing counts that the shutter is closed and that this is just a software/GUI issue. Until this is fixed please use this fix to correct the display. If this recipe seems weird to you - well, it is, but it works: On the Architect Console select the Shutter and enter the command configure.
    Submit and repeat this 5 times (or more) until you see the shutter image in the GUIs closing. No, you don't have to stand on one foot when doing this. After that you will have to RESET on the observer console.# Failed Exposures are Lost from Queue (10/02/2012)
    This behaviour is as expected, but can be frustrating when you're running a script. If one exposure fails, like with the intermittent TCSInterface error, then that exposure will not be retried and you'll be missing exposures from the intended set.
  11. Runaway Hexapod (10/09/2012)
    If in the course of testing, something may happen where the hexapod gets some ridiculous number and starts moving way too far. To fix this, go to the Console app in the vnc and type "HEXAPOD stop" or go the System Control in the Architect Console GUI do component ICS, device HEXAPOD, command stop. You will get back a message that this failed because the hexapod is busy moving, but it should stop anyway. Then be sure to turn off the buggy component, reconfigure, and give the hexapod a reasonable value to go back to (for instance, by setting the focus to a known reasonable value for the next exposure). The OCS might time out as the hexapod makes a long move; try again once it's arrived.
  12. Don't Forget to Reset Things after a Configure (10/04/2012)
    If the instance needs a reset, the Hexapod will go back the default settings from the ini file. If you had changed anything there (like which LUTs are being consulted), be sure to reset them after a configure.
  13. Observer1 startup/cleanup scripts (10/8/2012)
    The DECamObserver account has a few scripts in ~/bin to make managing all the SISPI windows easier. The start_sispi_windows script starts a bunch of browser windows, vnc for the GuiderGUI, and Skype. The organize_sispi_windows script spreads those windows out neatly across all 8 monitors. The observer_setup script runs the first script, sleeps a bit to let all the windows get their title bars, then runs the second script. Note that observer_setup follows the other scripts with "&"; there were issues with the first script not releasing the terminal and the second script never running. There is also an observer_cleanup script that kills all chrome, skype, and vncviewer processes. The observer_setup script has a shortcut on the desktop (which works now, unlike before). The observer_setup and observer_cleanup scripts are also available as drop-down icons from the menu bar; this is particularly useful for running the cleanup script when the desktop and all your xterms are buried under a pile of other windows.
  14. GUI Timeout and Freezes (9/28/2012)
    We have a known problem (but no solution yet) with the GUIs. Eventually they run out of resources on the observer1 machine (most often memory) and they crash or response becomes sluggish. The most dangerous situation are stale displays - the GUI's look fine but they are not updating. It is recommended to refresh each GUI occasionally. Note that it is safe to restart all GUIs without interrupting the SISPI instance.
    Kevin points out that Chrome gets really slow when using 1.4G of RAM. By using top and sorting by memory use (press "F" to selecting sorting column, and "n" to select memory), you can find the Chrome windows that are slowing everything down and kill 'em. The remaining Chrome windows become much more responsive after this clean up.
  15. Starfinder Timeouts (9/28/2012)
    With the default catalog (nomad_catalog pipeline6) SISPI (GCS and Donut) time out in prepareGCS, prepareDonut and break an interlock
    Solution: exclude Guider and Donut - either on the observer console or by setting the appropriate configuration variable (lookup_guidestar, for example). Using one of Kevin's reduced catalogs also works
  16. GCS does not stop Guider (9/28/2012)
    The GCS/Guider complex is the least tested part of SISPI. We have noticed that once in a while GCS misses to stop the Guider at the end of the exposure. The "sync_with_shutter" feature is designed to take care of this. A patch has been applied to the OCS to force the "stop_guiding" call. The effectiveness of this fix needs to be monitored.
  17. TCSInterface (9/28/2012)
    Less frequent than in past days we still observe that the TCSInterface breaks an interlock when it looses the connection to the TCS. In most cases we could trace this to issues/activities on the TCS side but the effect is the same: you need to reset (Check the interlock viewer and if the TCSINTERFACE is back to the READY state a simple RESET on the observer console gui is sufficient. If not you need to configure.
  18. TCS Slewing Issues (10/02/2012)
    During a long slew, the OCS timed out even though there were no errors from the TCS. It just had a long way to go. Also, there is a chance that some of the interlock issues are due to the dome moving slower than the telescope can slew; do we check for that? The OCS waits now for 10 minutes (ie basically forever) - you can abort this by pressing the "Abort TCS Command" in the TCS gui in the ICS display. The TCS now waits for the dome to be in position before telling SISPI that it is ready
  19. FCM Crashes (10/06/2012)
    When you get an error message that the FCM has failed (the interlock breaks as well) you need to restart this application from the architect console GUI.
    Select FCM (it runs on ics1) on the left view. Then click the restart button (best to double check that you really have FCM selected - you don't want to restart other components)
    Watch the log messages. When the FCM is back reconfigure SISPI.
  20. Microphone Input for Skype (10/06/2012)
    There is a microphone attached to observer1 (I think it's part of the webcam, which connects via USB) to use for making Skype calls. If this is not working, you may have the wrong audio input device select. To check, go to the menu bar and right click on the volume icon. Choose "Sound Preferences". The 3rd tab in this window is "Input". Choose the device "081d Analog Mono" (not the internal audio analog stereo). If you make some noise, you should then see the input level bars dance around. If not, check the "Input Volume" slider.