We’re in the early stages of deploying a platform built with micro services on Kubernetes. While there are a growing number of alternatives to k8s (all the cool kids are calling it k8s, or kube), Mesos, Nomad and Swarm being some of the bigger names, we came to the decision that k8s has the right balance of features, maturity and out of the box ease of use to make it the right choice for us. For the time being at least.
You may know k8s was gifted to the world by Google. In many ways it’s a watered down version of the Omega and Borg engines they use to run their own apps worldwide across millions of servers, so it’s got good provenance. The cynical among you (myself included) may suggest that Google gave us k8s as a way of enticing us to the Google Compute Engine. K8s is clearly designed with being run on GCE first and foremost with a number of the features only working on GCE. That’s not to say it can’t be run in other places, it can and does get used elsewhere from laptops and bare metal to other clouds. For example we run it in AWS and while functionality on AWS lags behind GCE, k8s still provides pretty much everything we need at this point.
As you can imagine, setting up k8s could be pretty complicated due to the number of elements involved in something which you can trust to run your applications in production with little more than a definition in a YAML file, fortunately the k8s team provide a pretty thorough build script to set it up for you called ‘kube-up’.
Kube-up is a script* capable of installing a fully functional cluster on anything from your laptop (Vagrant) through Rackspace, Azure, AWS and VMware and of course onto Google compute and container engine (GCE & GKE). Configuration and customisation for your requirements is done by modifying values in the scripts, or preferably exporting the appropriate settings into your env vars before running the script.
For a couple of reasons, which seemed good at the time, we’re running in AWS. While support for AWS is pretty good, the main feature missing currently that we’ve noticed is the lack of the ingress resource, which provides advanced L7 control such as rate limiting, it’s actually pretty difficult to find good information on what actually is supported, both in the Kube-up script and once k8s is running and in use. The best option is to read through the script, see what environment variables are mentioned and then have a play with them.
Along with a kube-up script, there is also a kube-down script (supplied in the tar file downloaded by kube-up). This can be very handy if you’re building and rebuilding clusters to better understand what you need but be warned, it also means it’s perfectly feasible to delete a cluster you didn’t want deleted.
So far I’ve found a few guideline which I think should be stuck to when using kube-up, these, with a reason why, are:
Create a stand-alone config file (a list of export Env=Vars) and source that script before running kube-up instead of modding the downloaded config files.
Having gone through the build process a couple of times now, I’ve come to the conclusion the best route is to define all the EnvVar overrides into a stand-alone file and source the file before running the main kube-up script. By default, kube-up will re-download the tar and replace the script directory, blowing away any overrides you may have configured. Downloading a new version of the tar file means you benefit from any fixes and improvements, keeping your config outside this means you don’t have to keep re-defining it. I should add too that I have had to hack the contents of various scripts to get the script to run without errors, so using the latest version doe help minimise this.
Don’t use the default Kubernetes cluster name, create a logical name (something that makes sense to use and stands the test of having 3-4, other clusters running alongside and still making sense what this one is)
Kube-up/down both rely on the information held in ~/.kube. This directory is created when you run kube-up and lets the kubectl script know where to connect and what credentials to use to mange the system through the API. If you have multiple clusters and have the details for the ‘wrong’ cluster stored in this file, kube-down will merrily also delete the wrong cluster.
In addition to this, in AWS, kube-up/down both rely heavily on AWS name tags. These tags are used during the whole lifecycle of the cluster so are important at all times. When kube-up provisions the cluster it will tag items to know which resources it’ll manage. The same tags are used by the master to control the cluster. For example; to add the appropriate instance specific routes to the AWS route tables. If the tags are missing, or duplicated (which can happen if you are building and tearing down clusters frequently and miss something in the tear-down) you can end up with a cluster which is reported as fully functional, but applications running in the cluster will fail to run.
One problem I found was that having laid out a nice VPC config, including subnet and route tables with Terraform and then having provisioned the system, when I came to deploying the k8s cluster the k8s script failed to bind it’s route table to the subnet which I ha told it to use. It failed because I had already defined one myself in Terraform. kube-up did report this as an error, but continued on and provisioned what looked like a fully functioning cluster. It wasn’t until the following day that we identified that there were important per-node routes missing. kube-up had provisioned and tagged a route table. Because that table was tagged, that’s the table the kube master was updating when minions were getting provisioned. The problem being that route table was not associated to my subnet. Once I had tagged by terraformed subnet with the appropriate k8s tag, the master would then update the correct table with new routes for minions. I had to manually copy across the routes from the other table for the existing minions.
Understand your network topology before building the cluster and define IP ranges for the cluster that don’t collide with your existing network and allow for more clusters to be provisioned alongside in the future.
If, for example you choose to deploy 2 separate clusters using the kube-up scripts they will both end up with the same IP addressing, they will also only be accessible over the internet. While this isn’t the end of the world, it’s not ideal and being able to access them using their private IP/name space is a huge improvement. Of course, if the kube-up provisioned IP range is the same as one of your internal networks, or you have 2 VPCs with the same IP ranges it becomes impossible to do this. Having a well thought-out Network and IP ranges also makes routing and security far simpler. If you know all your production services sit over there you can easily configure your firewalls to restrict access to that whole range.
Although you can pre-build the VPC, networks, gateways, route tables, etc. if you do, make sure they’re kube-up friendly, adding the right tags (which match the custom name you defined above.)
When building with dealt configs, kube-up will provision a new VPN into AWS. While this is great when you want to just get something up and running, it’s pretty likely you’ll actually want to build a cluster in a pre-existing VPC. You may also already have a way of building and managing these. We like to provision things with Terraform and so we found a way to configure kube-up to use an existing VPC (and to change it’s networking accordingly) there are still a number of caveats.
K8s makes heavy use of some networking tricks to provide an easy to use interface, however this means that to really understand k8s (you’re running your production apps on this, right? so you want a good idea how it’s running, right?) you should also have a good understanding of it’s networks. In essence, Kubernetes makes use of 2 largely distinct networks. The first is to provide IPs to the master and nodes and allows you to reach the surface of the cluster (allowing you to manage it, and deploy apps onto it and for those to be served to the world). It uses the second network to manage where the apps are within the cluster and to allow the scheduler to do what it needs to without you having to worry about what node an apps is deployed to and what port it’s on. If either of these network ranges collides with one of your existing networks you can get sub-optimal behaviour, even if this means you have to hop through hoops just to reach your cluster.
Update the security groups as soon as the system is built to restrict access to the nodes. We’ve built ours in a VPC with a VPN connection to our other systems, so we can restrict access to private ranges only.
Also note that by default, although kube-up will provision a private network for you in AWS, all the nodes end up getting public addresses and a security group which allows access to these nodes from anywhere over SSH and HTTP/S for the master. This strikes me as a little scary.
- Kube-up is in fact far more than just a single script, it downloads a whole tar file of scripts, but let’s keep it simple.