Project

General

Profile

Feature #9805

Trap and forward AWS warning for spot pricing VM termination

Added by Marco Mambelli over 4 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Start date:
08/10/2015
Due date:
% Done:

80%

Estimated time:
Duration:

Description

AWS will terminate a VM when spot pricing goes above the max allotted for the VM.
It is sending a 2 minutes warning before killing the machine (the mechanism is not clear needs to be investigated but recently added a push notification beside pulling an URL)

Tony asked/suggested to add a mechanism to forward the notification to the job
Could be a signal, a callback-script, ... something.
Some jobs may take advantage to checkpoint or to report that they are being evicted.

History

#1 Updated by Parag Mhashilkar over 4 years ago

  • Assignee set to HyunWoo Kim
  • Target version set to v3_3

#2 Updated by Parag Mhashilkar over 4 years ago

  • Stakeholders updated (diff)

#3 Updated by HyunWoo Kim over 4 years ago

  • % Done changed from 0 to 40

I reviewed glidein_startup.sh and files in glideinwms-vm-core RPM.
My current idea is as follows

First, modify glidein_startup.sh as follows

- create a new function which will be invoked by trap command below and will relay USR1 signal to the user job running
function on_spot {
echo "Received SPOT two minute termination signal... signaling the main child processes" 1>&2
ON_DIE=1
kill -USR1 %2
#HK> as shown below, %2 must point to the actual usr job which is spawned as the second background job..
}

trap 'on_stpo' USR1

check-spot-notice.sh &
"${gs_id_work_dir}/$last_script" glidein_config &
wait $!

Second, create a new child script (check-spot-notice.sh) and run it as the first background job in glidein_startup.sh
and this new script will do the following
- check the metadat URL every 5 seconds (as recommended by AWS)
- if termination-time is set, send USR1 signal to the parent (glidein_startup.sh) so that the parent can relay this signal to the actual user job.

while True ; do
spot=$(curl -s http://169.254.169.254/latest/meta-data/spot/termination-time)
if [ -n $spot ]; then
kill -USR1 $PPID
exit
fi
sleep 5
done

#4 Updated by Parag Mhashilkar over 4 years ago

  • Priority changed from Normal to Low

#5 Updated by Parag Mhashilkar almost 4 years ago

  • Priority changed from Low to Normal

Hyunwoo, I had a brief with Tony and this feature may be required in next couple of months. I think you should resume the work on this ticket accordingly.

#6 Updated by Parag Mhashilkar almost 4 years ago

  • Project changed from GlideinWMS to gwms-cloud-vms
  • Target version deleted (v3_3)

#7 Updated by Parag Mhashilkar almost 4 years ago

  • Target version set to v1.0.10

#8 Updated by HyunWoo Kim almost 4 years ago

  • Status changed from New to Feedback
  • Assignee changed from HyunWoo Kim to Parag Mhashilkar
  • % Done changed from 40 to 80

Branch 9805 is created in ssh:///cvs/projects/gwms-cloud-vms

Summary:
- Added a new file check-preempt-wrap.sh
- Modified spec file to deploy this file in glideinwms-vm-ec2 package under /usr/sbin directory
- Modified pilot-launcher such that it first checks if glideinwms-vm-ec2 package exists and then runs /usr/sbin/check-preempt-wrap.sh

I am placing this ticket under feedback

#9 Updated by Parag Mhashilkar over 3 years ago

  • Assignee changed from Parag Mhashilkar to HyunWoo Kim

Begin forwarded message:

From: Parag Mhashilkar
Subject: feedback: 9805
Date: May 3, 2016 at 3:52:19 PM CDT
To: Hyunwoo Kim
1) check-preempt-wrap.sh
Code cleanup… remove if [ 1=1 ]

2) pilot-launcher & spec file

Why do you need to run rpm -q command?

Have you thought of using the script as PRE script and putting it in that dir as part of rpm installation?

_____________________________________
Parag Mhashilkar

Fermi National Accelerator Laboratory
WWW: www.fnal.gov
Phone: 1 (630) 840-6530
Fax: 1 (630) 840-3109
_____________________________________

#10 Updated by HyunWoo Kim over 3 years ago

I have updated this.

On 6/7/16, 3:50 PM, "Hyun Woo Kim" <hyunwoo@fnal.gov> wrote:

Hi Parag,

I have updated this work.
Important changes are:

1. the new script check-preempt-wrap.sh is now put under PRE/
   (i.e. in the source it is under pre-scripts/)
   It turns out check-preempt-wrap.sh was already using nohub command
inside
   so, it could be put under PRE/

2. I have added a couple of new “checks” in check-preempt-wrap.sh
   - checks if it is running inside AWS
   - checks if it is running in a Spot instance.

3. pilot-launcher is restored back to the original.

4. modified rpm_spec file accordingly..

I pushed these changes in 9805 branch of
  ssh://p-gwms-cloud-vms@cdcvs.fnal.gov/cvs/projects/gwms-cloud-vms

#11 Updated by HyunWoo Kim over 3 years ago

I updated this again this morning.
This new script will be only found in the ec2-specific rpm(glideinwms-vm-ec2).
This means that of the 2 new features that I put yesterday in the script,
the first one that checks if it is AWs is unnecessary
but the second one that checks if it is a spot instance is still necessary.
I believe this is now ready to be released after another review if necessary.

#12 Updated by Parag Mhashilkar over 3 years ago

Changes look ok. I am assuming you tested them. I have merged it to master and trying to build the rpm but having issues. I will try to build the rpm and push it to dev repo.

#13 Updated by Parag Mhashilkar over 3 years ago

  • Status changed from Feedback to Closed


Also available in: Atom PDF