Procedure for upgrading to / rolling back from OFED 3.5¶
This procedure has been tested on SLF6.3.
Note that the rollback procedure does not include rolling back to the old kernels, etc. If you are nervous about this, you should omit step 1. from the upgrade process.
- Update your kernel and associated RPMs to the latest errata, reboot and recompile non-related drivers (eg NVIDIA). Note this is probably not comulsory, but is what was done on the FNAL machines to keep them up to date with publicly-visible nodes in the same ensemble.
- Download the OFED stack. DO NOT REMOVE THE SOURCE FOR THE OLD ONE -- you might need it for rollback. Note that I was successful with 3.5-2-20130827-0545. Anything earlier or later than that is not tested.
- Uninstall the old OFED stack using the
uninstall.shin the old OFED distribution directory.
- Install the OFED stack using option 3; do NOT configure the IB interfaces. Installing on subsequent nodes: use the "-c ofed.conf" option to the install.pl script.
- Make a copy of
/etc/yum.confin a safe place.
- Make sure
/etc/yum.confhas the following excludes line:
# This is to avoid problems with the IB drivers from OFED.org. exclude=compat-rdma,compat-rdma-devel,dapl,dapl-debuginfo,dapl-devel,dapl-devel-static,dapl-utils,ibacm,ibsim,ibsim-debuginfo,ibutils,infiniband-diags,infinipath-psm,infinipath-psm-devel,libcxgb3,libcxgb3-debuginfo,libcxgb3-devel,libcxgb4,libcxgb4-debuginfo,libcxgb4-devel,libibcm,libibcm-debuginfo,libibcm-devel,libibmad,libibmad-debuginfo,libibmad-devel,libibmad-static,libibumad,libibumad-debuginfo,libibumad-devel,libibumad-static,libibverbs,libibverbs-debuginfo,libibverbs-devel,libibverbs-devel-static,libibverbs-utils,libipathverbs,libipathverbs-debuginfo,libipathverbs-devel,libmlx4,libmlx4-debuginfo,libmlx4-devel,libmthca,libmthca-debuginfo,libmthca-devel-static,libnes,libnes-debuginfo,libnes-devel-static,librdmacm,librdmacm-debuginfo,librdmacm-devel,librdmacm-utils,mstflint,ofed-docs,ofed-scripts,opensm,opensm-debuginfo,opensm-devel,opensm-libs,opensm-static,perftest,qperf,qperf-debuginfo,rds-devel,rds-tools,srptoolsNote that this excludes line is shorter than that required for older versions of OFED, and should supersede it.
- Download MVAPICH2 1.9.
- Configure with:
./configure --prefix=/usr/local/mvapich2-1.9 --with-rdma=gen2 --enable-shared --enable-g=dbg --enable-debuginfo RSH_CMD=/usr/bin/rshsetting the prefix as appropriate.
- Build and install MVAPICH2 1.9.
- As the user appropriate for altering the products area:
. <products-dir>/setup setup cetpkgsupport product-stub mvapich2 v1_9_0 /usr/local/mvapich2-1.9If you use a different prefix when compiling and installing MVAPICH2 1.9, you should mirror that in the product-stub invocation.
- Add the following entry to
* - memlock unlimited
- Don't forget to configure the subnet manager(s)
rm -rf <products-dir>/mvapich2/v1_9_0*
rm -rf /usr/local/mvapich2-1.9(or appropriate area).
- Restore your backup copy of
- Remove OFED 3.5-2 with
./uninstall.shand the appropriate option.
- Install the old version of OFED (should be 22.214.171.124). "All packages", then defaults. You may need a non-standard
ofa_kernelPRM, which may be found as
cluck:/usr/local/OFED-126.96.36.199/SRPMS/ofa_kernel-188.8.131.52-OFED.184.108.40.206.src.rpmin order to have the old OFED stack working on the new kernel.
- Reconfigure the correct node(s) to run the subnet manager