Project

General

Profile

Scheduled maintenance 2012 May 17

updated 2012-05-17 13:40 UTC

  • AFS upgrade 05:00-08:00 CDT - COMPLETE
    • Supports Fermilab kcron principals
    • May require client reboots
  • Fermi Web Server move from ACE to F5 load balancer 04:00--09:00 CDT - COMPLETE
  • GCC tape robot - AM
    • does not affect Minos users
  • minos25 kernel update for stability - deferred
SERVICE DOWN UP Notes
AFS 03:00 06:09 need restart on minos25
WEB 05:00 07:50
GCC 08:00 11:54 does not affect Minos

Issues after planned maintenance

  • Expected :
  • Fixed : Restarted kreymer monitoring scripts
minos-mysql2 topdb_log
minos25 bluwatch
minosq_log
  • Pending :
  • Risk summary :
    Date: Tue, 15 May 2012 10:42:04 -0500
    From: Lee Lueking <lueking@fnal.gov>
    Subject: Re: AFS Outage - can it be rescheduled?
    
    Hi Rob,
     We have reviewed the impact of the AFS outage on your processing 
     and believe it will be minimal.
    
    1. For the three hours of the down time (5-8AM CDT), 
       interactive logins will not work
    2. local batch jobs will be affected, 
       but in the last week we only see one or two local jobs.
    3. grid jobs already running will not be affected. 
       Obviously during this time users will not be able to submit new jobs.
    4. the grid monitoring plots will not be updated, 
       but the central web  service will be down so no one can look at them anyway.
    
    We have taken action to avoid issues on critical servers when the afs
    service returns. This includes stopping the afs service on minosmysql2,
    minos25 and minos54 before the outage, and restarting it after the work is
    done. This will avoid having to reboot any of these nodes.
    
    So, with this we believe the afs upgrade should proceed as planned and hope
    you and your colleagues will concur.
    
    We have delayed the scheduled rebooting minos25 until later in June. This
    was planned to resolve another problem but we think it should be postponed
    for now so we do not disrupt your grid processing activities.
    
    Regards,
    
    Lee