Author: Hunter Blanks
Last week, Amazon announced new support for using spot instances with Elastic MapReduce — a pretty exciting thing, when you consider that EC2 spot instances typically run for less than half the cost of on demand instances.
Although you can now set up EMR spot instance jobflows using the AWS console or Amazon’s “elastic-mapreduce” ruby client, boto 2.0 does not yet support it. So, here’s a two-commit fork that adds early support:
(Update: this fork has been pulled in to boto’s mainline. Yay!)
Please note: so far, I’ve managed to fix add_instance_groups() and modify_instance_groups(), so that you can add a TASK instance group to an existing jobflow. I have not yet updated run_jobflow() so that you can start a job with MASTER or CORE spot instances — it’ll take a little more time, and, in our case, will only make sense after we patch mrjob as well, since that’s what we typically use to start EMR jobflows.
But, in the meantime, here’s an example of how you can add spot instances to your first existing jobflow:
from boto.emr.instance_group import InstanceGroup
c = boto.connect_emr()
jf for jf in c.describe_jobflows() if jf.state in ('WAITING', 'RUNNING')
jf = get_first_jobflow()
# NB: you can't add CORE nodes to a jobflow...
ig = InstanceGroup(6, 'TASK', 'c1.medium', 'SPOT', 'spot-0.07', '0.07')
In this example, ‘spot-0.07’ is an arbitrary name, and ‘0.07’ is the actual spot price you’re willing to pay.
You can also resize this instance group later with:
ig = get_first_jobflow().instancegroups[-1]
This particular command resizes the InstanceGroup you just added to 3 instances.