NOvA Computing System Administration General Notes¶
Accessing IPMI Consoles on novadaq-far-* machines¶
To access the IPMI console, allowing you to monitor a startup/shutdown operation or to log in when ssh isn't working, simply run the program
# cons <hostname>as root on a node inside the Fermilab network domain (VPN or on-site). There is also a command-line tool on the Far Detector cluster called "ipmiwrap" which exposes the ipmitool interface. For example, to turn off the power to novadaq-far-farm-02, use the following command as root on novadaq-far-master:
# ipmiwrap -I lanplus -UADMIN -H novadaq-far-farm-02.novaipmi.fnal.gov chassis power soft # Here, "soft" indicates that you want to initiate an ACPI soft-shutdown, giving the OS a chance to shut down gracefully.
The ipmiwrap command is self-documenting, entering incomplete commands will show completion hints:
ipmitool chassis power chassis power Commands: status, on, off, cycle, reset, diag, soft
Also, there is a command for simply controlling the power state of the machines:
# pnode <hostname> <op>where op is one of on, off, cycle, ...
Power management on novadaq-near-* machines¶
Connect to firstname.lastname@example.org via Kerberized
ssh and run the sudo command, e.g.:
sudo /usr/local/sbin/pnode.novadaq novadaq-near-datadisk-02 status
Currently Andrew Norman and Pengfei Ding has access to this console. Contact them if you need to power-cycle any novadaq-near-* machine.
Opening Service Now tickets for system-level configuration changes¶
The Near and Far Detector computing clusters are managed by the ECF group at Fermilab. Most changes to system-level configuration files will be overwritten by the management software within 12 hours. If you need to change ANY file in /etc, or need packages installed (yum install or remove; list and search are OK), open a Service Now ticket detailing your request, the hosts to which it should apply, and add a "Please refer to ECF" to the end. ECF cannot provide support for these machines if they are in an undefined or ill-defined configuration, so it will save everybody heartache down the line if configuration changes are applied properly and recorded in their configuration management system.
The Nova DAQ clusters involve a complicated network. The status of this network can be viewed through a special page that network services setup for us under their "Solar Winds" software.
To use this go to:
The user is "nova" the password you already have if you are authorized to work on the cluster.