Network Policies

making use of kubernetes network policies to add security to your clusters

There’s a neat feature in kubernetes called Network Policies which we’ve been using a little while now. Network Policies (or NetPols) are much like AWS security groups, but within the kube cluster.

The really great thing about NetPols is that they “understand” the tags you may already be attributing to your deployements, ingress controllers, etc. This means you can quite easily group related components and wrap a NetPol around these and limit access to these resources.

A great example would be to define a NetPol which restricts any access to a namespace except for any traffic which originates from the name space. This gives you a great level of isolation at the namespace level. You can the grant ingress to that namespace exclusively through a designated ingress controller on a specific port.

Like wise, you can create a network policy to restrict traffic to a database to only come from pods which actualy need to access a database. Using kubetags in your NetPols provides a really powerful and flexible way of locking down your intra-cluster, inter-namespace traffic for very little cost.

While Network Policies are managed through the kube API, much like ingress resources, they’re not actually implemented by kube. Instead you need something which understands these kube NetPols to actually turn then into something which can be enforced.

In our cluster we make use of the Weave plugin. Weave, or something similar, e.g. Calico, Flannel, etc tend to come out of the box with kube deployments these days, so the chances are you already have something which is capable of implementing Network Policies.

Weave is an agent which is deployed onto each kube node. It manages things like inter-pod routing, but becuase it’s installed at the OS layer, it also has access to manipulate the IP Tables rules. This is how it implements the access restrictions defined by the network policies. Each policy is converted to a collection of IP tables rules, co-ordinated across each machine, which translates the kube tags into something that can be recognised on each machine.

One thing we’re hoping to achieve with Weave net in the future, though not tried yet is to include standalone hosts, outside the kube cluster, in the network policy by running the weave agent on those nodes. I’ll create a future post on how we get on with that!

As you may have picked-up; becuase network policies are implemeted using IP Tables, they operate at layer 3/4 of the OSI model (i.e. IP/TCP) so you can define which IPs/Ports you may or may not access, but they’re not able to operate on URLs, for example, which are at layer 7.

In order to do this you need to build rules into an ingress controller, or look at something more powerful like a service-mesh, e.g. Istio. By combining the 2 you can build-up a great degree of security, with 2 independant, complimentary systems managing access to resources.

So, what does a network policy look like?

The following policy will prevent ingress access into a namespace to everything except traffic originating in the namespace (note that while Network Policies do support controlling egress, we’ve not yet tried this, but it’s on our roadmap!)

kind: NetworkPolicy
  name: namespace-isolation-live
  namespace: live
  - from:
    - namespaceSelector:
          name: live
  podSelector: {}

The policy is pretty simple:

Allow ingress from resources where the namespace matches “live” and apply to all pods (the ’{}’ ; denotes ALL pods in the namespace).

In order to actually serve traffic from the namespace we have another policy:

kind: NetworkPolicy
  name: elb-isolation-live
  namespace: live
  - ports:
    - port: 443
      protocol: TCP
    - port: 80
      protocol: TCP
  - {}
      name: ingress-controller

likewise, this policy is alo pretty simple:

Allow ingress from all pods, where the ports are TCP:443 and TCP:80 and apply the policy to pods with have the label matching: ingress-controller (which our ingress controllers do)

Voila – “Security groups” within the kube cluster. what are you waiting for? Get securing!

Spot instances for the win!

Cloud computing is supposed to be cheap, right?

No longer do we need to fork out £5-10k for some silicon and tin, and pay for space and the power, the cables and the install, etc, etc. Building in the cloud meant we could go and provision a host and leave it running for a few hours and remove it when we were done. No PO/finance hoops to jump through, no approvals needed, just provision the host and do your work.

So, in some ways this was true, there was little or no upfront cost and it’s easier to beg forgiveness than permission, right? But the fact is we’ve moved on from the times when AWS was a demo environment, or a test site, or something that the devs were just toying with. Now it’s common for AWS (or Azure, or GCE) to be your only compute environment and the fact is the bills are much bigger now. AWS has become our biggest platform cost and so we’re always looking for ways to reduce our cost commitments there.

At the same time that AWS and cloud have become mainstream for many of us, so too have microservices, and while their development and testing benefits of microservices are well recognised, the little-recognised truth is that they also cost more to run. Why? Because as much as I may be doing the same amount of ‘computing’ as I was in the monolith (though I suspect we were actually doing less there) each microservice now wants its own pool of memory. The PHP app that we ran happily on a single 2GB server with 1CPU has now been split out into 40 different components, each with its own baseline memory consumption of 100MB, so I’ve already doubled my cost base just by using a more ‘efficient’ architecture.

Of course, AWS offers many ways of reducing your compute costs with them. There are many flavours of machine available, each with memory and CPU offerings tuned to your requirements. You can get 50%+ savings on the cost of compute power by committing to paying for the system for 3 years (you want the flexible benefits of cloud computing, right?). Beware the no-upfront reservations though – you’ll lose most of the benefits of elastic computing, with very little cost-saving benefits.

You could of course use an alternative provider, Google bends over backward to prove they have a better, cheaper, IaaS, but the truth is we’re currently too in-bed and busy to move provider (we’ve only just finished migrating away from Rackspace, so we’re in no hurry to start again!)

So, how can we win this game? Spot Instances. OK, so they may get turned off at any moment, but the fact is for the majority of common machine types you will pay 20% of the on-demand price for a spot instance. Looking at the historical pricing of spot instances also gives you a pretty good idea how likely it is that a spot instance will be abruptly terminated. The fact is, if you bid at the on-demand price for a machine – i.e. what you were GOING to pay, but put it on a spot instance instead, you’ll end up paying ~20% of what you were going to and your machine will almost certainly still be there in 3 months time. As long as your bid price remains above spot price, your machine will stay on and you will pay the spot price, not your bid!

AWS Spot Price History

What if this isn’t certain enough for you? If you really want to take advantage of spot instances, build your system to accommodate failure and then hedge your bids across multiple compute pools of different instance types. You can also reserve a baseline of machines, which you calculate to be the bare minimum needed to run your apps, and then use spots to supplement that baseline pool in order to give your systems more burst capacity.

How about moving your build pipeline on to spot instances or that load test environment?

Sure, you can’t bet your house on them, but given the right risk approach to them you can certainly save a ton of money of your compute costs.

Taming the Beast

Security is a journey, we’ve all heard it said but how many of us believe it and who knows where they’re trying to go?  I think we do, and that’s ‘Our next audit’ – we want to breeze through each audit like passing street lamps on a motorway.

At Wealth Wizards we deal with personal data. We provide financial guidance to customers. To do this we need customers’ personal data. their valuable personal data. Not just address and email (which are actually considered freely-available, or business card data) but information on savings, investments, tax details, health conditions, etc. We don’t store credit card or bank details but we do have all the really personal stuff. What this means is that over time we will build up a large dataset of things that con men, attackers, villains and others want to get hold of.  We know this is valuable, as do our customers and we want our customers to trust us. We want to instill confidence that when a customer tells us something, it’s private and remains so. This means we know security is important, and without it, it doesn’t matter how much effort we put into building up our business, because if the data is stolen and exposed then it could be the downfall of our business.

One of the best ways for us to show prospective clients and customers that we’re serious about security is to show our credentials and accreditation. Evidence that we have a rigorous process that stands up to rigorous audits. ISO 27001 is designed to do just this. Which is why we are working towards achieving ISO 27001 this year. However, anyone who knows ISO 27001 will know it’s a beast and not for the faint of heart and so the trick is learning how we can use ISO to advantage instead of against us.  We can use ISO as a framework to build up the policies and process we use as a business.  Instead of trying to fight it, we’re going to make it help us.

We don’t just want to add security on to what we’ve done, we want to build security into what we do.  We deal with a lot of big companies. when we’re selling our products and things are getting close to signing contracts, those big companies (we’re talking tens of thousands of employees) start asking us about our processes, about our data security and more importantly, in their eyes, their data security. In other words, they start auditing us. No one likes an audit, but if you can show an auditor that you do care about things and that you do have processes then they tend to avoid asking the really difficult questions. Even then, when you really do care about things, it doesn’t matter if they do ask the difficult questions because you have an answer for them.

I’m currently going through the ‘Technical Measures’ questions with our team here and it feels endless; How can I prove what we did X, how can I prove why we did Y, how can I show what something looked like on this date compared to that date. Those are difficult questions to answer at the best of times but more so when you’re running in an elastic environment where a server instance may only exist for a day or two. What’s becoming apparent though is that ISO is asking questions that I actually want to be answered myself, regardless of what certification we go for.  I, as a sys admin, want to have a record of what happened, when and why. I also want to know that something happened because we made it happen. If I know this then I can start to answer questions about why something doesn’t work at 3 am.  So already I’m starting to find that while ISO is a beast, it can actually be tamed to be a friendly beast. On our path to ISO, we will build the framework that will define the tasks we need to do to build security into our platform, that will show the auditors what they want to see, as well as the meat behind it to prove it’s not just paperwork.

By doing a true risk assessment of our business and technical environment, we start to build an accurate picture of our weaknesses, both in terms of security as well as our processes. Once we have identified these then we can start to build suitable responses. It looks overwhelming to begin with but before long it starts to become clear that the automation that we’re building to allow hands-off delivery of our applications is also the solution we need to be able to record what was deployed when and why. The automation scripts are the perfect mechanism to build these audit trails rather than having to rely on someone to manually ensure these actions are identified!

How do we ensure there is a separation of concerns? That no one is putting back-door code into production? Why peer review of the code (both application and infrastructure) allows us to enforce this programmatically! Suddenly ISO has become my friend. Sure, it’s still a beast, but it’s not blocking our delivery but helping to define what processes we need and therefore it’s starting to write our automation algorithms. How cool is that!? Ok, perhaps cool is a little strong.

So while we’re still very much en-route, I’m confident we’re on the right path and that the next audit will be us proving we’re secure, not hiding the things we don’t want to be seen. Don’t be afraid of the beast called ISO, embrace it and use it to your advantage.

Mars Attacks!!! Ack, Ack-Ack!

Last Tuesday we saw our first (recognised DDoS attack.  At 12:09 GMT we started to see an increase in XML-RPC GET requests against our marketing site, hosted on WordPress. We don’t serve XMLRPC so we knew this was non-valid traffic for a start.

By 12:11 GMT traffic volumes were well above what the system could handle and the ELBs started to return 503 responses. By 12:20 GMT the request rate was over 250 x higher than usual. At this point, we were trying to establish what was causing the demand. We don’t currently have the highest coverage of monitoring over our marketing sites so this took us a little while. Eventually, by 12:30, using the ELB logs, we had managed to establish we were seeing requests from all over the world, all making GET requests to /xmlrpc.php. We don’t typically see requests from China, Serbia, Thailand and Russia, among others so it was pretty obvious this was a straight forward DDoS attack.

Shortly after 12:30 GMT the request rate drops off just as quickly as it started and by 12:35 GMT it was over and the site recovered. Either the BotNet Attack got bored, they had achieved their purpose (investigation into the consequence of the attack continues with our security partner) or AWS Shield did its free, little-known job and suppressed the attack…

Whatever led to the attack, it passed as quickly as it arrived, and from initial assessment had little purpose. At least we’ve had our first taste of an attack and will be able to better tackle the next one. In the meantime, we continue to analyse logs to determine if there was any more to the attack than a simple DDoS, or if there was something more malicious intended.

learning to live with Kubernetes clusters.

In my previous post I wrote about how we’re provisioning kubernetes clusters using kube-up and some of the problems we’ve come across during the deployment process. I now want to cover some of the things we’ve found while running the clusters. We’re in an ‘early’ production phase at the moment. Our legacy apps are still running on our LAMP systems and our new micro services are not yet live. We only have a handful of micro services so far, so you can get an idea of what stage we’re at. We’re learning that some of the things we need to fix mean going back and rebuilding the clusters but we’re lucky that right now this isn’t breaking anything.

We’ve been trying to build our kubernetes cluster following a pretty conventional model; internet facing components (e.g. NAT gateways and ELBs) in a DMZ and the nodes sitting in a private subnet using NAT gateways to access the internet. Despite the fact that the kube-up scripts support the minion nodes being privately addressed, the ELBs also get created in the ‘private subnet’  thus preventing the ELBs from serving public traffic. There have been several comments online around this and the general consensus seems to suggest it’s not currently possible. We have since found though there are a number of annotations available for ELBs suggesting it may be possible by using appropriate tags on subnets (we’re yet to try this though. I’ll post an update if we have any success)

Getting ELBs to behave the way we want with SSL has also been a bit of a pain. Like many sites, we need the ELB to serve plain text on port 80 and TLS on port 443, with both listeners serving from a single backend port. As before, the docs aren’t clear on this. the Service documentation tells us about the https and cert annotations but doesn’t tell you the other bits that are necessary. Again, looking at the source code was a big help and we eventually got a config which worked for us.

kind: Service
apiVersion: v1
  name: app-name
  annotations: "arn:aws:acm:us-west-1:xxxxxxxxxx:certificate/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx” "http" "443"
    app:  app-name
  - protocol: TCP
    name: secure
    port: 443
    targetPort: 80
  - protocol: TCP
    name: insecure
    port: 80
    targetPort: 80
  type: LoadBalancer


Kubernetes comes with a bunch of add-ons ‘out of the box’. By default when running kube-up, you’ll get heapster/influxDB/grafana installed. You’ll also get FluentD/Elasticsearch/Kibana along with a dashboard and DNS system. While the DNS system is pretty much necessary (in fact remember to ensure you want more than one replica, in one cluster iteration, the DNS system stopped and would’t start again rendering the cluster mostly useless.) The other add-ons are perhaps less valuable. We’ve found heapster consumes a lot of resource and gives limited (by which I mean, not what I want) info. InfluxDB is also very powerful but will get deployed into non-persistent storage. Instead we’ve found it preferable to deploy prometheus into the cluster and deploy our own updated grafana container. Prom actually gives far better cluster metrics than heapster and there are lots of pre-built dashboards for grafana meaning we can get richer stats faster.

Likewise Fluentd -> Elasticsearch gives an in-built log collection system, but the provisioned ES is non-persistent and by having fluent ship the logs straight to ES you loose many of the benefits of dropping in logstash and running grok filters to make sense of the logs. It’s trivial  (on the assumption you don’t already have fluentD deployed, see below!) to drop in filebeat to replace fluentD and this makes it easy to add logstash before sending the indexed logs to ES. In this instance we decided to use AWS provided Elasticsearch to save ourselves the trouble of building the cluster. When deploying things like log collectors, make you they’re deployed as a daemonSet. This will make sure you have an instance of the pod running on each node, regardless of how many nodes you have running, which is exactly what you want for this type of monitoring agent.

Unfortunately, once you’ve deployed a k8s cluster (through kube-up) with these add-ons enabled, it’s actually pretty difficult to remove them. It’s easy enough to removing a running instance of a container, but if a minion node gets deleted and a new one provisioned, the add-ons will turn up again on the minions. This is because kube-up makes use of salt to manage the initial minion install and salt kicks in again for the new machines. To date I’ve failed to remove the definition from salt and have found the easiest option is just to rebuild the k8s cluster without the add ons. To do this, export the following variables when running kube-up:


Of course, this means re-provisioning the cluster, but I did say we’re fortunate enough to be able to do this (at the moment at least!)

Our next task is to sufficiently harden the nodes, ideally running the system on CoreOS.

Building Kubernetes Clusters

We’re in the early stages of deploying a platform built with micro services on Kubernetes. While there are a growing number of alternatives to k8s (all the cool kids are calling it k8s, or kube), Mesos, Nomad and Swarm being some of the bigger names, we came to the decision that k8s has the right balance of features, maturity and out of the box ease of use to make it the right choice for us. For the time being at least.

You may know k8s was gifted to the world by Google. In many ways it’s a watered down version of the Omega and Borg engines they use to run their own apps worldwide across millions of servers, so it’s got good provenance. The cynical among you (myself included) may suggest that Google gave us k8s as a way of enticing us to the Google Compute Engine. K8s is clearly designed with being run on GCE first and foremost with a number of the features only working on GCE. That’s not to say it can’t be run in other places, it can and does get used elsewhere from laptops and bare metal to other clouds. For example we run it in AWS and while functionality on AWS lags behind GCE, k8s still provides pretty much everything we need at this point.

As you can imagine, setting up k8s could be pretty complicated due to the number of elements involved in something which you can trust to run your applications in production with little more than a definition in a YAML file, fortunately the k8s team provide a pretty thorough build script to set it up for you called ‘kube-up’.

Kube-up is a script* capable of installing a fully functional cluster on anything from your laptop (Vagrant) through Rackspace, Azure, AWS and VMware and of course onto Google compute and container engine (GCE & GKE). Configuration and customisation for your requirements is done by modifying values in the scripts, or preferably exporting the appropriate settings into your env vars before running the script.

For a couple of reasons, which seemed good at the time, we’re running in AWS. While support for AWS is pretty good, the main feature missing currently that we’ve noticed is the lack of the ingress resource, which provides advanced L7 control such as rate limiting,  it’s actually pretty difficult to find good information on what actually is supported, both in the Kube-up script and once k8s is running and in use. The best option is to read through the script, see what environment variables are mentioned and then have a play with them.

Along with a kube-up script, there is also a kube-down script (supplied in the tar file downloaded by kube-up). This can be very handy if you’re building and rebuilding clusters to better understand what you need but be warned, it also means it’s perfectly feasible to delete a cluster you didn’t want deleted.

So far I’ve found a few guideline which I think should be stuck to when using kube-up, these, with a reason why, are:

Create a stand-alone config file (a list of export Env=Vars) and source that script before running kube-up instead of modding the downloaded config files. 

Having gone through the build process a couple of times now, I’ve come to the conclusion the best route is to define all the EnvVar overrides into a stand-alone file and source the file before running the main kube-up script. By default, kube-up will re-download the tar and replace the script directory, blowing away any overrides you may have configured. Downloading a new version of the tar file means you benefit from any fixes and improvements, keeping your config outside this means you don’t have to keep re-defining it. I should add too that I have had to hack the contents of various scripts to get the script to run without errors, so using the latest version doe help minimise this.

Don’t use the default Kubernetes cluster name, create a logical name (something that makes sense to use and stands the test of having 3-4, other clusters running alongside and still making sense what this one is) 

Kube-up/down both rely on the information held in ~/.kube. This directory is created when you run kube-up and lets the kubectl script know where to connect and what credentials to use to mange the system through the API. If you have multiple clusters and have the details for the ‘wrong’ cluster stored in this file, kube-down will merrily also delete the wrong cluster.

In addition to this, in AWS, kube-up/down both rely heavily on AWS name tags. These tags are used during the whole lifecycle of the cluster so are important at all times. When kube-up provisions the cluster it will tag items to know which resources it’ll manage. The same tags are used by the master to control the cluster. For example; to add the appropriate instance specific routes to the AWS route tables. If the tags are missing, or duplicated (which can happen if you are building and tearing down clusters frequently and miss something in the tear-down) you can end up with a cluster which is reported as fully functional, but applications running in the cluster will fail to run.

One problem I found was that having laid out a nice VPC config, including subnet and route tables with Terraform and then having provisioned the system, when I came to deploying the k8s cluster the k8s script failed to bind it’s route table to the subnet which I ha told it to use. It failed because I had already defined one myself in Terraform. kube-up did report this as an error, but continued on and provisioned what looked like a fully functioning cluster. It wasn’t until the following day that we identified that there were important per-node routes missing. kube-up had provisioned and tagged a route table. Because that table was tagged, that’s the table the kube master was updating when minions were getting provisioned. The problem being that route table was not associated to my subnet. Once I had tagged by terraformed subnet with the appropriate k8s tag, the master would then update the correct table with new routes for minions. I had to manually copy across the routes from the other table for the existing minions.

Understand your network topology before building the cluster and define IP ranges for the cluster that don’t collide with your existing network and allow for more clusters to be provisioned alongside in the future. 

If, for example you choose to deploy 2 separate clusters using the kube-up scripts they will both end up with the same IP addressing, they will also only be accessible over the internet. While this isn’t the end of the world, it’s not ideal and being able to access them using their private IP/name space is a huge improvement. Of course, if the kube-up provisioned IP range is the same as one of your internal networks, or you have 2 VPCs with the same IP ranges it becomes impossible to do this. Having a well thought-out Network and IP ranges also makes routing and security far simpler. If you know all your production services sit over there you can easily configure your firewalls to restrict access to that whole range.

Although you can pre-build the VPC, networks, gateways, route tables, etc. if you do, make sure they’re kube-up friendly, adding the right tags (which match the custom name you defined above.)

When building with dealt configs, kube-up will provision a new VPN into AWS. While this is great when you want to just get something up and running, it’s pretty likely you’ll actually want to build a cluster in a pre-existing VPC. You may also already have a way of building and managing these. We like to provision things with Terraform and so we found a way to configure kube-up to use an existing VPC (and to change it’s networking accordingly) there are still a number of caveats.

K8s makes heavy use of some networking tricks to provide an easy to use interface, however this means that to really understand k8s  (you’re running your production apps on this, right? so you want a good idea how it’s running, right?) you should also have a good understanding of it’s networks. In essence, Kubernetes makes use of 2 largely distinct networks. The first is to provide IPs to the master and nodes and allows you to reach the surface of the cluster (allowing you to manage it, and deploy apps onto it and for those to be served to the world). It uses the second network to manage where the apps are within the cluster and to allow the scheduler to do what it needs to without you having to worry about what node an apps is deployed to and what port it’s on. If either of these network ranges collides with one of your existing networks you can get sub-optimal behaviour, even if this means you have to hop through hoops just to reach your cluster.

Update the security groups as soon as the system is built to restrict access to the nodes. We’ve built ours in a VPC with a VPN connection to our other systems, so we can restrict access to private ranges only. 

Also note that by default, although kube-up will provision a private network for you in AWS, all the nodes end up getting public addresses and a security group which allows access to these nodes from anywhere over SSH and HTTP/S for the master. This strikes me as a little scary.

  • Kube-up is in fact far more than just a single script, it downloads a whole tar file of scripts, but let’s keep it simple.