Spot instances for the win!

Cloud computing is supposed to be cheap, right?

No longer do we need to fork out £5-10k for some silicon and tin, and pay for space and the power, the cables and the install, etc, etc. Building in the cloud meant we could go and provision a host and leave it running for a few hours and remove it when we were done. No PO/finance hoops to jump through, no approvals needed, just provision the host and do your work.

So, in some ways this was true, there was little or no upfront cost and it’s easier to beg forgiveness than permission, right? But the fact is we’ve moved on from the times when AWS was a demo environment, or a test site, or something that the devs were just toying with. Now it’s common for AWS (or Azure, or GCE) to be your only compute environment and the fact is the bills are much bigger now. AWS has become our biggest platform cost and so we’re always looking for ways to reduce our cost commitments there.

At the same time that AWS and cloud have become mainstream for many of us, so too have microservices, and while their development and testing benefits of microservices are well recognised, the little-recognised truth is that they also cost more to run. Why? Because as much as I may be doing the same amount of ‘computing’ as I was in the monolith (though I suspect we were actually doing less there) each microservice now wants its own pool of memory. The PHP app that we ran happily on a single 2GB server with 1CPU has now been split out into 40 different components, each with its own baseline memory consumption of 100MB, so I’ve already doubled my cost base just by using a more ‘efficient’ architecture.

Of course, AWS offers many ways of reducing your compute costs with them. There are many flavours of machine available, each with memory and CPU offerings tuned to your requirements. You can get 50%+ savings on the cost of compute power by committing to paying for the system for 3 years (you want the flexible benefits of cloud computing, right?). Beware the no-upfront reservations though – you’ll lose most of the benefits of elastic computing, with very little cost-saving benefits.

You could of course use an alternative provider, Google bends over backward to prove they have a better, cheaper, IaaS, but the truth is we’re currently too in-bed and busy to move provider (we’ve only just finished migrating away from Rackspace, so we’re in no hurry to start again!)

So, how can we win this game? Spot Instances. OK, so they may get turned off at any moment, but the fact is for the majority of common machine types you will pay 20% of the on-demand price for a spot instance. Looking at the historical pricing of spot instances also gives you a pretty good idea how likely it is that a spot instance will be abruptly terminated. The fact is, if you bid at the on-demand price for a machine – i.e. what you were GOING to pay, but put it on a spot instance instead, you’ll end up paying ~20% of what you were going to and your machine will almost certainly still be there in 3 months time. As long as your bid price remains above spot price, your machine will stay on and you will pay the spot price, not your bid!

AWS Spot Price History

What if this isn’t certain enough for you? If you really want to take advantage of spot instances, build your system to accommodate failure and then hedge your bids across multiple compute pools of different instance types. You can also reserve a baseline of machines, which you calculate to be the bare minimum needed to run your apps, and then use spots to supplement that baseline pool in order to give your systems more burst capacity.

How about moving your build pipeline on to spot instances or that load test environment?

Sure, you can’t bet your house on them, but given the right risk approach to them you can certainly save a ton of money of your compute costs.