Project

General

Profile

Procedure for upgrading to / rolling back from OFED 3.5

This procedure has been tested on SLF6.3.

Note that the rollback procedure does not include rolling back to the old kernels, etc. If you are nervous about this, you should omit step 1. from the upgrade process.

Upgrade.

  1. Update your kernel and associated RPMs to the latest errata, reboot and recompile non-related drivers (eg NVIDIA). Note this is probably not comulsory, but is what was done on the FNAL machines to keep them up to date with publicly-visible nodes in the same ensemble.
  2. Download the OFED stack. DO NOT REMOVE THE SOURCE FOR THE OLD ONE -- you might need it for rollback. Note that I was successful with 3.5-2-20130827-0545. Anything earlier or later than that is not tested.
  3. Uninstall the old OFED stack using the uninstall.sh in the old OFED distribution directory.
  4. Install the OFED stack using option 3; do NOT configure the IB interfaces. Installing on subsequent nodes: use the "-c ofed.conf" option to the install.pl script.
  5. Make a copy of /etc/yum.conf in a safe place.
  6. Make sure /etc/yum.conf has the following excludes line:
    # This is to avoid problems with the IB drivers from OFED.org.
    exclude=compat-rdma,compat-rdma-devel,dapl,dapl-debuginfo,dapl-devel,dapl-devel-static,dapl-utils,ibacm,ibsim,ibsim-debuginfo,ibutils,infiniband-diags,infinipath-psm,infinipath-psm-devel,libcxgb3,libcxgb3-debuginfo,libcxgb3-devel,libcxgb4,libcxgb4-debuginfo,libcxgb4-devel,libibcm,libibcm-debuginfo,libibcm-devel,libibmad,libibmad-debuginfo,libibmad-devel,libibmad-static,libibumad,libibumad-debuginfo,libibumad-devel,libibumad-static,libibverbs,libibverbs-debuginfo,libibverbs-devel,libibverbs-devel-static,libibverbs-utils,libipathverbs,libipathverbs-debuginfo,libipathverbs-devel,libmlx4,libmlx4-debuginfo,libmlx4-devel,libmthca,libmthca-debuginfo,libmthca-devel-static,libnes,libnes-debuginfo,libnes-devel-static,librdmacm,librdmacm-debuginfo,librdmacm-devel,librdmacm-utils,mstflint,ofed-docs,ofed-scripts,opensm,opensm-debuginfo,opensm-devel,opensm-libs,opensm-static,perftest,qperf,qperf-debuginfo,rds-devel,rds-tools,srptools
    Note that this excludes line is shorter than that required for older versions of OFED, and should supersede it.
  7. Reboot.
  8. Download MVAPICH2 1.9.
  9. Configure with:
    ./configure  --prefix=/usr/local/mvapich2-1.9 --with-rdma=gen2 --enable-shared --enable-g=dbg --enable-debuginfo RSH_CMD=/usr/bin/rsh
    setting the prefix as appropriate.
  10. Build and install MVAPICH2 1.9.
  11. As the user appropriate for altering the products area:
    . <products-dir>/setup
    setup cetpkgsupport
    product-stub mvapich2 v1_9_0 /usr/local/mvapich2-1.9
    If you use a different prefix when compiling and installing MVAPICH2 1.9, you should mirror that in the product-stub invocation.
  12. Add the following entry to /etc/security/limits.conf:
    *               -       memlock         unlimited
  13. Don't forget to configure the subnet manager(s)

Rollback.

  1. rm -rf <products-dir>/mvapich2/v1_9_0*
  2. rm -rf /usr/local/mvapich2-1.9
    (or appropriate area).
  3. Restore your backup copy of yum.conf to /etc.
  4. Remove OFED 3.5-2 with ./uninstall.sh and the appropriate option.
  5. Install the old version of OFED (should be 1.5.4.1). "All packages", then defaults. You may need a non-standard ofa_kernel PRM, which may be found as cluck:/usr/local/OFED-1.5.4.1/SRPMS/ofa_kernel-1.5.4.1-OFED.1.5.4.1.src.rpm in order to have the old OFED stack working on the new kernel.
  6. Reboot.
  7. Reconfigure the correct node(s) to run the subnet manager