Project

General

Profile

Multiple TOP Tests

For the multiple TOP tests, at least 4 copies of the program "top" were started on each DCM with an update frequency of 10ms. This was done by starting them in a gnu screen sessions via a script.

The script was:

#!/bin/bash
#
SCREENPROG=screen
SSHPROG=ssh
SSHOPT="-t -l root" 
torture_prog="top" 
torture_prog_args="-d 0.01" 
#
#
# Do the torture by diblock
let diblock_start=$1
let diblock_end=$1
#
let dcmstart=$2
let dcmend=$3

let DB_COUNTER=$diblock_start
let DCM_COUNTER=$dcmstart

# Torture should be passed a dcm name, command and iterations
# order should be: tortureDCM $dcm $cmd $iter
tortureDCM()
{
  echo -en "Preparing to torture $1\n" 
  let myI=1
  while [ $myI -le 4 ]; do 
      echo "$2\n" 
      # eval $2
      let myI=$myI+1
  done
  echo Torture Started on $1
}

# Loop over Diblocks
while [ $DB_COUNTER -le $diblock_end ]; do
    # Loop over DCM positions on Diblocks
    db=`printf %02d $DB_COUNTER`
    while [ $DCM_COUNTER -le $dcmend ]; do
    pos=`printf %02d $DCM_COUNTER`
    dcmhost=dcm-2-$db-$pos
    wintitle="echo -ne '\ek' $dcmhost '\e\\'" 
    failstring="echo $dcmhost has failed at `date` " 
    cmd="$SCREENPROG $SSHPROG $SSHOPT $dcmhost \"$wintitle; $torture_prog $torture_prog_args\" " 
    # Execute the burn-in sequence on the DCM
     echo $cmd
     eval $cmd
     eval $cmd
     eval $cmd
     eval $cmd
    # tortureDCM $dcmhost $cmd 4
    let DCM_COUNTER=$DCM_COUNTER+1
    done
    let DCM_COUNTER=$dcmstart
    let DB_COUNTER=$DB_COUNTER+1
done 
echo "Torture tests started" 


The list of running processes can then be check with:

ps -eo "%a" |grep ssh |grep dcm |sort -k 5

Rick K. swears this method will get a dcm to fail within minutes. What I have found is that it gets a DCM to fail within minutes IF that DCM is near death, has been hit over the head with a bat while being bitten by a venomous snakes. Translation: It works if the machine is already very very angry.