Wealth Wizards sponsors Silicon Canal

We’re pleased to be ‘Terabyte’ sponsors of Silicon Canal, a not-for profit organisation whose aim is to create a tech ecosystem in the Midlands. With our HQ in Leamington Spa, we want to encourage tech talent in the area, promote the Midlands as tech hub and get together with like-minded people. We want to show that if you want a successful career in tech, you don’t have to move to London! We will be involved and supporting Silicon Canal’s events throughout the year including sponsoring the ‘Most Influential Female in Technology’ award at the Silicon Canal Tech Awards.

Spot instances for the win!

Cloud computing is supposed to be cheap, right?

No longer do we need to fork out £5-10k for some silicon and tin, and pay for space and the power, the cables and the install, etc, etc. Building in the cloud meant we could go and provision a host and leave it running for a few hours and remove it when we were done. No PO/finance hoops to jump through, no approvals needed, just provision the host and do your work.

So, in some ways this was true, there was little or no upfront cost and it’s easier to beg forgiveness than permission, right? But the fact is we’ve moved on from the times when AWS was a demo environment, or a test site, or something that the devs were just toying with. Now it’s common for AWS (or Azure, or GCE) to be your only compute environment and the fact is the bills are much bigger now. AWS has become our biggest platform cost and so we’re always looking for ways to reduce our cost commitments there.

At the same time that AWS and cloud have become mainstream for many of us, so too have microservices, and while their development and testing benefits of microservices are well recognised, the little-recognised truth is that they also cost more to run. Why? Because as much as I may be doing the same amount of ‘computing’ as I was in the monolith (though I suspect we were actually doing less there) each microservice now wants its own pool of memory. The PHP app that we ran happily on a single 2GB server with 1CPU has now been split out into 40 different components, each with its own baseline memory consumption of 100MB, so I’ve already doubled my cost base just by using a more ‘efficient’ architecture.

Of course, AWS offers many ways of reducing your compute costs with them. There are many flavours of machine available, each with memory and CPU offerings tuned to your requirements. You can get 50%+ savings on the cost of compute power by committing to paying for the system for 3 years (you want the flexible benefits of cloud computing, right?). Beware the no-upfront reservations though – you’ll lose most of the benefits of elastic computing, with very little cost-saving benefits.

You could of course use an alternative provider, Google bends over backward to prove they have a better, cheaper, IaaS, but the truth is we’re currently too in-bed and busy to move provider (we’ve only just finished migrating away from Rackspace, so we’re in no hurry to start again!)

So, how can we win this game? Spot Instances. OK, so they may get turned off at any moment, but the fact is for the majority of common machine types you will pay 20% of the on-demand price for a spot instance. Looking at the historical pricing of spot instances also gives you a pretty good idea how likely it is that a spot instance will be abruptly terminated. The fact is, if you bid at the on-demand price for a machine – i.e. what you were GOING to pay, but put it on a spot instance instead, you’ll end up paying ~20% of what you were going to and your machine will almost certainly still be there in 3 months time. As long as your bid price remains above spot price, your machine will stay on and you will pay the spot price, not your bid!

AWS Spot Price History

What if this isn’t certain enough for you? If you really want to take advantage of spot instances, build your system to accommodate failure and then hedge your bids across multiple compute pools of different instance types. You can also reserve a baseline of machines, which you calculate to be the bare minimum needed to run your apps, and then use spots to supplement that baseline pool in order to give your systems more burst capacity.

How about moving your build pipeline on to spot instances or that load test environment?

Sure, you can’t bet your house on them, but given the right risk approach to them you can certainly save a ton of money of your compute costs.

Making asynchronous code look synchronous in JavaScript

Why go asynchronous

Asynchronous programming is a great paradigm which offers a key benefit over its synchronous counterpart – non blocking I/O within a single threaded environment. This is achieved by allowing I/O operations such as network requests and reading files from disk to run outside of the normal flow of the program. By doing so this enables responsive user interfaces and highly performant code.

The challenges faced

To people coming from a synchronous language like PHP, the concept of asynchronous programming can seem both foreign and confusing at first, which is understandable. One moment you were programming one line at a time in a nice sequential fashion, the next thing you know you’re skipping entire chunks of code, only to jump back up to those chunks at some time later. Goto anyone? Ok, it’s not *that* bad.
Then, you have the small matter of callback hell, a name given to the mess you can find yourself in when you have asynchronous callbacks nested within asynchronous callbacks several times deep – before you know it all hell has broken loose.
Promises came along to do away with callback hell, but for all the good they did, they still did not address the issue of code not being readable in a nice sequential fashion.

Generators in ES6

With the advent of ES6, along came a seemingly unrelated paradigm – generators. Generators are a powerful construct, allowing a function to “yield” control along with an (optional) value back to the calling code, which can in turn resume the generator function, passing an (optional) value back in. This process can be repeated indefinitely.

Consider the following function, which is a generator function (note the special syntax), and look at how its called:

function *someGenerator() {
  console.log(5); // 5
  const someVal = yield 7.5;
  console.log(someVal); // 10
  const result = yield someVal * 2;
  console.log(result); // 30
}

const it = someGenerator();
const firstResult = it.next();
console.log(firstResult.value); // 7.5
const secondResult = it.next(10);
console.log(secondResult.value); // 20
result.next(30);

Can you see what’s going on? The first thing to note is that when a generator is called, an iterator is returned. An iterator is an object that knows how to access items from a collection, one item at a time, keeping track of where it is in the collection. From there, we call next on the iterator, passing control over to the generator, and running code up until the first yield statement. At this point, the yielded value is passed to the calling code, along with control. We then call next, passing in a value and with it we pass control back to the generator function. This value is assigned to the variable someVal within the generator. This process of passing values in and out of the generator continues, with console log’s providing a clearer picture of what’s going on.

One thing to note is the de-structuring of value from the result of each call to next on the iterator. This is because the iterator returns an object, containing two key value pairs, done, and value. done represents whether the iterator is complete. value contains the result of the yield statement.

Using generators with promises

This mechanism of passing control out of the generator, then at some time later resuming control should sound familiar – that’s because this is not so different from the way promises work. We call some code, then at some time later we resume control within a thenable block, with the promise result passed in.

It therefore only seems reasonable that we should be able to combine these two paradigms in some way, to provide a promise mechanism that reads synchronously, and we can!

Implementing a full library to do this is beyond the scope of this article, however the basic concepts are:

  • Write a library function that takes one argument (a generator function)
  • Within the provided generator function, each time a promise is encountered, it should be yielded (to the library function)
  • The library function manages the promise fulfillment, and depending on whether it was resolved or rejected passes control and the result back into the generator function using either next or throw
  • Yielded promises should be wrapped in a try catch
For a full working example, check out a bare bones library I wrote earlier in the year called awaiting-async, complete with unit tests providing example scenarios.

How this looks

Using a library such as this (there are plenty of them out there), we can take the following code from this:

const somePromise = Promise.resolve('some value');

somePromise
  .then(res => {
    console.log(res); // some value
  })
  .catch(err => {
    // (Error handling code would go in here)
  });
To this:
const aa = require('awaiting-async');

aa(function *() {
  const somePromise = Promise.resolve('some value');
  try {
    const result = yield somePromise;
    console.log(result); // some value
  } catch (err) {
    // (Error handling code would go in here)
  }
});

And with it, we’ve made asynchronous code look synchronous in JavaScript!

tl;dr

Generator functions can be used in ES6 to make asynchronous code look synchronous.

What went wrong? Reverse-engineering disaster

Last week, we nearly pushed a bad configuration into production, which would have broken some things and made some code changes live that were not ready. Nearly, but not quite: while we were relieved that we’d caught it in time, it was still demoralising to find out how close we had come to trouble, and a few brave souls had to work into the evening to roll back the change and make it right.

Rather than shouting and pointing fingers, the team came together, cracked open the Post-Its and Sharpies and set to engineering. The problem to be solved: what one thing could we change to make this problem less likely, or less damaging?

What happened?

The first step was for the team to build a cohesive view of what happened. We did that by using Post-Its on the wall to construct a timeline: everybody knew what they individually had done and had seen, and now we could put all of that together to describe the sequence of events in context. Importantly, we described the events that occurred not the people or feelings: “the tests passed in staging” not “QA told me there wouldn’t be a problem”.

Yes, the tests passed, but was that before or after code changes were accepted? Did the database migration start after the tests had passed? What happened between a problem being introduced, and being discovered?

Why was that bad?

Now that we know the timeline, we can start to look for correlation and insight. So the tests passed in staging, is that because the system was OK in staging, because the tests missed a case, because the wrong version of the system ran in testing, or because of a false negative in the test run? Is it expected that this code change would have been incorporated into that migration?

The timeline showed us how events met our expectations (“we waited for a green test run before starting the deployment”) or didn’t (“the tests passed despite this component being broken”, “these two components were at incompatible versions”). Where expectations were not met, we had a problem, and used the Five Whys to ask what the most…problemiest…problem was that led to the observed effect.

What do we need to solve?

We came out of this process with nine different things that contributed to our deployment issue. Nine problems are a lot to think about, so which is the most important or urgent to solve? Which one problem, if left unaddressed, is most likely to go wrong again or will do most damage if it does?

More sticky things were deployed as we dot-voted on the issues we’d raised. Each member of the team was given three stickers to distribute to the one-three issues that seemed highest priority to solve: if one’s a stand-out catastrophe, you can put all three dots on that issue.

This focused us a great deal. After the dots were counted, one problem (gaps in our understanding of what changes went into the deployment) stood out above the rest. A couple of other problems had received a few votes, but weren’t as (un)popular: the remaining six issues had zero or one dot each.

I got one less problem without ya

Having identified the one issue we wanted to address, the remaining question was how? What shall we do about it? The team opted to create a light-weight release checklist that could be used in deployment to help build the consistent view we need of what is about to be deployed. We found that we already have the information we need, so bringing it all into one place when we push a change will not slow us down much while increasing our confidence that the deployment will go smoothly.

A++++ omnishambles; would calamity again

The team agreed that going through this process was a useful activity. It uncovered some process problems, and helped us to choose the important one to solve next. More importantly, it led us to focus on what we as a team did to get to that point and what we could do to get out of it, not on what any one person “did wrong” and on finding someone to blame.

Everyone agreed that we should be doing more of these root cause analyses. Which I suppose, weirdly, means that everybody’s looking forward to the next big problem.

Using Ansible with WordPress

WordPress is a great tool to use when creating websites as it provides flexibility when managing content.

As you may be aware, one of the operational downsides of managing websites run on WordPress is how frequent new releases are released as part of patching vulnerabilities. This comes with the overhead pain and cost of having to upgrade your WordPress instance every other week.

As we are living in a world that thrives of automation, here at Wealth Wizards we thought it would be a good idea to automate the upgrade process by using configuration management tools like Ansible combined with the power of AWS API’s.

As our WordPress sites are deployed in AWS, we decided to use the AWS API’s to provision instances, manage snapshots alongside configure and apply security groups. We then decided to use various Ansible modules to install packages, update configs, encrypt and decrypt files pushed and retrieved from AWS S3 as well as change permissions on files and directories as part of the upgrade process.

Switching from the traditional method of manually moving files using plugins and bash commands to an automated manor has allowed us to gain more control over our upgrades as well as reduce the time it takes from a day to a 2-hour process, with most of that being dedicated to AWS provisioning. Automating the process using Ansible has also given us the ability to upgrade multiple instances at once over the traditional method of doing one instance at a time.

Microservices make hotfixes easier

Microservices can ease the pain of deploying hotfixes to live due to the small and bounded context of each service.

Setting the scene

For the sake of this post, imagine that your system at work is written and deployed as a monolith. Now, picture the following situation: stakeholder – “I need this fix in before X, Y, and Z”. It’s not an uncommon one.
But let’s say that X, Y, and Z are all already in the mainline branch and deployed to your systems integration environment. This presents a challenge. There are various ways you could go about approaching this – some of them messier than others.

The nitty gritty

One approach would be to individually revert the X, Y, and Z commits in Git, implement the hotfix straight onto the mainline, and deploy the latest build from the there. Then, when ready, (and your hotfix has been deployed to production), you would need to go back and individually revert the reverts. A second deployment would be needed to bring your systems integration environment back to where it was, (now with the hotfix in there too), and life carries on. Maybe there are better ways to do this, but one way or another it’s not difficult to see how much of a headache this can potentially cause.

Microservices to the rescue!

But then you remember that you are actually using microservices and not a monolith after all. After checking, it turns out that X, Y and Z are all changes to microservices not affected by the hotfix. Great!
Simply fix the microservice in question, and deploy this change through your environments ahead of the microservices containing X, Y, and Z, and voila. To your stakeholders, it looks like a hotfix, but to you it just felt like every other release!

Conclusion

Of course, you could still end up in a situation where a change or two needs to be backed out of one or more of your microservice mainlines in order for a hotfix to go out, however I’m betting it will not only be less often, but I’m also betting that it will be less of a headache than with your old monolith.

 

Mars Attacks!!! Ack, Ack-Ack!

Last Tuesday we saw our first (recognised DDoS attack.  At 12:09 GMT we started to see an increase in XML-RPC GET requests against our marketing site, hosted on WordPress. We don’t serve XMLRPC so we knew this was non-valid traffic for a start.

By 12:11 GMT traffic volumes were well above what the system could handle and the ELBs started to return 503 responses. By 12:20 GMT the request rate was over 250 x higher than usual. At this point, we were trying to establish what was causing the demand. We don’t currently have the highest coverage of monitoring over our marketing sites so this took us a little while. Eventually, by 12:30, using the ELB logs, we had managed to establish we were seeing requests from all over the world, all making GET requests to /xmlrpc.php. We don’t typically see requests from China, Serbia, Thailand and Russia, among others so it was pretty obvious this was a straight forward DDoS attack.

Shortly after 12:30 GMT the request rate drops off just as quickly as it started and by 12:35 GMT it was over and the site recovered. Either the BotNet Attack got bored, they had achieved their purpose (investigation into the consequence of the attack continues with our security partner) or AWS Shield did its free, little-known job and suppressed the attack…

Whatever led to the attack, it passed as quickly as it arrived, and from initial assessment had little purpose. At least we’ve had our first taste of an attack and will be able to better tackle the next one. In the meantime, we continue to analyse logs to determine if there was any more to the attack than a simple DDoS, or if there was something more malicious intended.