Project

General

Profile

Datamon and Nearline Machine Maintenance

This page describes how to perform various maintenance tasks on the novadaq-{far,near}-datamon and novadaq-{far,near}-nearline-01 machines.

More up-to-date (and easier to follow) instructions for release installation are in docdb-24142.
You may still find it useful to refer to this wiki for some steps.

Here are the main steps to roll to a new release:

Done by the novasoft user:

  1. Update the development software
  2. Install a new release (update AND pull new externals)
  3. Edit setup_nova_whatever functions in the .bashrc if needed

Done by the novanearline or novadaq/novacr02 user:

  1. Edit setup_nova_whatever functions in the .bashrc if needed
  2. Make new test release if you need it (the right directory to put these in depends on the machine)
  3. Point the scripts on the crontab (nearline) or desktop (datamon) to the new release (Test this first! Don't code like man...)

After you are done, update this page with the information about the new release you just installed:
https://cdcvs.fnal.gov/redmine/projects/novadaqcontrolroomsw/wiki/Offline_Software_Releases

The rest of this page has more explicit details for some of these steps.

How to install a new offline release.

See docdb-24142 for the most up-to-date instructions for installing CR and Nearline releases.

  • Log into the machine as "novasoft"
  • Set up the offline environment with the new setup function:
    (Check the bashrc first to confirm the options are not already in there. If the function doesn't have -b maxopt -6 /online_monitor/offline -e /online_monitor/externals then specify them)

From the CR (datamon) machines

 setup_nova_datamon -r development

From the nearline machines

 setup_nova_nearline -r development -b maxopt -6 /nusoft/offline_svn/ -e /nusoft/externals/

Note that this might tell you some packages are missing. In the future, this should not happen. For now some packages (the ones we don't use fpr nearline stuff) we can be ok with missing. (ie. 'lembig')

  • cd into $SRT_PUBLIC_CONTEXT and then cd up one directory (so that you are in the directory that contains all of the offline releases.)
  • Check that you are not in danger of running out of disk space before installing a new release
    df -h
    
  • Update the software in development first using the update-release script. You want to do this for development first so it is up to date, then for the new release you are installing.
nohup update-release -rel development > /online_monitor/install_logs/update_development_<date>.log &

or
nohup update-release -rel development > /nusoft/install_logs/update_development_<date>.log &
  • Install the new offline release
    nohup update-release -rel {release}  > /online_monitor/install_logs/update_<release>_<date>.log &
    

    or
    nohup update-release -rel {release}  > /nusoft/install_logs/update_<release>_<date>.log &
    
  • Install externals. These instructions might change, check the link at the top of the page.

If you have never done this before, check the general instructions here:

https://cdcvs.fnal.gov/redmine/projects/novaart/wiki/Installing_a_local_copy_of_NOvASoft_and_the_external_products
  • In principle you can just do the following:
    cd $SRT_PUBLIC_CONTEXT/SRT_NOVA/scripts/install_scripts
    
    < install script for your new release >
    
  • Check to see if you got all the new externals you needed. Log out and log back in, then setup_nova .... as before BUT in the new release, NOT development.

Note that this might tell you some packages are missing. In the future, this should not happen. For now some packages (the ones we don't use fpr nearline stuff) we can be ok with missing. (ie. 'lembig') but only a select few should appear here. (Alas, this is part word of mouth and part additional knowledge.. please edit this wiki with more info when you can!)

  • Once all of the appropriate externals are installed, log out and log back in (still as the "novasoft" user.) Then setup the offline environment as before
  • cd into $SRT_PUBLIC_CONTEXT and build the release. First build everything clean:
    SRT_NOVA/scripts/novasoft_build -rel {new release} -c
    

    Then build saving the output to check for errors later. Note that for some unknown reason, the builds on slf6 machine will not proceed correctly if you compile them with the parallel option (According to Michael, Fernanda has never confirmed this)
    nohup SRT_NOVA/scripts/novasoft_build -rel {new release} > /online_monitor/install_logs/build_<release>_<date>_maxopt.log &
    

    or
    nohup SRT_NOVA/scripts/novasoft_build -rel {new release} > /nusoft/install_logs/build_<release>_<date>_maxopt.log &
    
  • When the build is done, check the log file for errors (you can grep the file for "Error" and "error" and "ERROR" and "warning").

Pointing scripts to the new version of the software

To point all of the control room OnMon/EVD scripts to the correct release, edit the .bashrc file in the home area (for both the "novasoft" and "novadaq" users) so that the "setup_novasoft_onmon" function points to the new release.

This section is under development ... (funny how that sentence might mean tomorrow, in a year or never ever, right?)

<br />

Previous Problems and Hiccups during installations

CRITICAL Nothing works unless fixed
NON-OPTIMAL Do your best to fix it but don't get held up by it if you can't afford to
MINOR Be a good citizen and try to fix it if you can

Machine_Name Release_Installed Problem Symptom Solution
far-nearline S14-12-02 Missing Packages Errors during build
_fatal error: Whatever.h doesn't exist_
or something to that effect
Due to network problems while update_release is doing it's thing.
Re-ran update_release which pulled the package on the second attempt and then it built happily
far-nearline S14-12-02 Can't find old Packages Errors in update-release
_svn: URL 'whatever' doesn't exist
svn checkout failed._

but no missing packages message in setup_nova and no problems building...
The install_externals script might reference something which no longer exist. Looks ugly and scary, tell the software people in case they are not aware of this.
nearline and datamon most releases before S14-12-02 Missing Externals the setup_nova function will tell you a package is missing
ERROR: Product 'art' (with qualifiers 'e6:nu:prof'), 
has no v1_12_05 version (or may not exist)
Have you installed missing externals? There are instructions for this further up on this page. If you have run install_externals then continue reading...
This can happen if the install_externals script is out of date or has errors.
You can either contact the software manager and ask about it or try to find the package in the gpvms and copy it over yourself, like a grown up!
If you have chosen to do it yourself you can also:
Look for the missing externals here
http://scisoft.fnal.gov/scisoft/packages/
Then cd into the $EXTERNALS/tars directory and wget them
Then cd into $EXTERNALS and tar -xf tars/the_externals.tar.bz2
nearline and datamon most releases before S14-12-02 Missing Externals you don't need the setup_nova function will tell you a package is missing
ERROR: Product 'lembig' (with qualifiers 'e6:nu:prof'), 
has no v1_12_05 version (or may not exist)  
Same as above but, in principle if you KNOW you don't need the missing stuff (i.e. lembig or valgrind in the past) you could skip copying it over....
Avoid doing so if you don't understand if/why you don't need the package...
far-nearline development (2015-01-15) Development release too screwed up Too many weird errors on update-release log, locked packages, etc Re-install the development release from scratch:
Don't do this unless you have no other choice AND you know what you are doing
cd into the directory with the releases (/nusoft/offline_svn/releases in nearline machines)
rm -rf <release, in this case development>
source ../srt/srt.sh 
wget https://cdcvs.fnal.gov/redmine/projects/novaart/repository/raw/trunk/SRT_NOVA/scripts/update-release
chmod +x update-release
export CVSROOT='svn+ssh://p-novaart@cdcvs.fnal.gov/cvs/projects/novaart/pkgs.svn'
nohup ./update-release -rel <in this case development> > /nusoft/install_logs/update-<in this case dev>-date.log &

More info here
far-nearline development (2015-01-15) Locked package Message in update-release log
Updating package SRT_NOVA
svn: Working copy '.' locked
svn: run 'svn cleanup' to remove locks
(type 'svn help cleanup' for details)
cd into the release location and into the package
svn cleanup

Try update-release again

Switching scripts on the Nearline machine

(This should technically be in the nearline wiki.. move it when you can)

  • ssh to the machine as novanearline
  • cd into Nearline-test-releases and make a new test release:
    newrel -t <tag_name> <tag_name>
  • Get the HEAD version of OnlineMonitoring and Commissioning and build
  • Turn off the current processing (ProcessNew OnMon and Ana Files) from the crontab. Do not turn new processing on before doing this or you might overload the machine and she might cry
  • Turn on processing with the new tag or run it manually
  • Check the output files from known errors in /nearline-data/
  • Once you are satisfied that the files are correct, let it run a couple times on it's own and check again
  • Once you are sure you haven't broken anything you can repeat this process for the plot making scripts (in near-nearline-01)

Remember these look for files in bluearc so you might want to run the RSync script first

  • Make sure the following processed are switched and running on the correct crontabs:
Machine Name Process User
far-nearline-01 OnMon Processing(FarDet)
Ana Processing(FarDet)
CleanupScript
novanearline
far-nearline-02
far-nearline-03
far-nearline-04
OnMon Processing(FarDet)
Ana Processing(FarDet)
novanearline
far-nearline-01 RSync to Buearc novadaq
near-nearline-01 OnMon Processing(NearDet)
Ana Processing(NarDet)
CleanupScript
RSync
OnMon Plot Making (Near and Far)
AnA Plot Making (Near and Far)
HardWare watch list (Near and Far)
novanearline

Testing nearline processes in development before installing a new tag

Before requesting a new tag please test all the jobs in development if possible. Here are the steps:

  • login to near-nearline-01 (and far-nearline-{01-04}) as novanearline
  • confirm which tag we are currently running of for all scripts in the crontab
  crontab -l
  • cd into the test release which is currently in use and verify that all changes have been committed
  svn diff -r HEAD
  • Commit changes which have been left astray note that there might be different changes made in the fardet and neardet machines which you might have to reconcile

Once you have convinced yourself that all changes have been reconciled you can move on to updating development and testing:

  • logout and log back in as novasoft
  • setup the nova software
  • update development on the machine
  • build development and check for errors
  • make a new development test release