Project

General

Profile

NOvA Computing System Administration General Notes

Accessing IPMI Consoles on novadaq-far-* machines

To access the IPMI console, allowing you to monitor a startup/shutdown operation or to log in when ssh isn't working, simply run the program

# cons <hostname>
as root on a node inside the Fermilab network domain (VPN or on-site). There is also a command-line tool on the Far Detector cluster called "ipmiwrap" which exposes the ipmitool interface. For example, to turn off the power to novadaq-far-farm-02, use the following command as root on novadaq-far-master:
# ipmiwrap -I lanplus -UADMIN -H novadaq-far-farm-02.novaipmi.fnal.gov chassis power soft # Here, "soft" indicates that you want to initiate an ACPI soft-shutdown, giving the OS a chance to shut down gracefully.

The ipmiwrap command is self-documenting, entering incomplete commands will show completion hints:
ipmitool chassis power
chassis power Commands: status, on, off, cycle, reset, diag, soft

Also, there is a command for simply controlling the power state of the machines:

# pnode <hostname> <op>
where op is one of on, off, cycle, ...

Power management on novadaq-near-* machines

Connect to via Kerberized
ssh and run the sudo command, e.g.:

sudo  /usr/local/sbin/pnode.novadaq novadaq-near-datadisk-02 status

Currently Andrew Norman and Pengfei Ding has access to this console. Contact them if you need to power-cycle any novadaq-near-* machine.

Opening Service Now tickets for system-level configuration changes

The Near and Far Detector computing clusters are managed by the ECF group at Fermilab. Most changes to system-level configuration files will be overwritten by the management software within 12 hours. If you need to change ANY file in /etc, or need packages installed (yum install or remove; list and search are OK), open a Service Now ticket detailing your request, the hosts to which it should apply, and add a "Please refer to ECF" to the end. ECF cannot provide support for these machines if they are in an undefined or ill-defined configuration, so it will save everybody heartache down the line if configuration changes are applied properly and recorded in their configuration management system.

Cluster Networking

The Nova DAQ clusters involve a complicated network. The status of this network can be viewed through a special page that network services setup for us under their "Solar Winds" software.

To use this go to:
[[
https://fnorion.fnal.gov]]

The user is "nova" the password you already have if you are authorized to work on the cluster.