Migrating Kubernetes out of the Big Cloud Providers
“Move to kubernetes to save costs” they said in the early days of the k8s frenzy. This was trusting that an efficient pod bin (node) packing would save on node costs (there’s also autoscale but regular cloud already offers that).
The reality is that the overhead costs of running the control plane and auxiliary services in each node (DNS, metric and log collectors etc) plus extra easy ways to make costly mistakes turns most Kubernetes installations a more expensive proposition than running the workloads without it.
For the record, the great thing about k8s and the reason for its success (besides resume-driven technology) is in the standardization it provides and its extensibility or modularity (this plug-in advantage is the reason mediocre software like Wordpress is successful for example).
Managed k8s in the “big three” public cloud providers: Amazon Elastic Kubernetes Service (EKS) in AWS, Google Kubernetes Engine (GKE) in GCP and Azure Kubernetes Service (AKS) in Azure for a startup is expensive.
On the other hand I don’t want to manage k8s master nodes and essential services on “baremetal” – funny that nowadays that means a Virtual Machine (VM) – so I was looking for an intermediate solution between the expensive fully managed k8s and the cheapest (in dollars, not in time) completely self-managed k8s.
GKE Autopilot
As somebody put it “GKE is kubernetes on easy mode”. GKE comes in two flavours: standard and autopilot. I have experience running production on GKE standard at a startup and managing node pools and upgrades was pretty painful. Autopilot takes care of those paint points and now that it has matured, is the default GKE type now. So this made me decide to go for it.
While GKE standard charges per node (underlying Compute Engine instances), for Autopilot “the Autopilot pricing model simplifies billing forecasts and attribution.” There are two types of Autopilot billing modes: Pod-based billing model, the default, unless you specify specific hardware (node-based billing model).
I set up a GKE Autopilot cluster with default options from Terraform and ran a small production load on it (two SadServers scenarios that can run off K8s pods).
Some of the lessons learned:
- Data plane cost used to be free but it’s USD $0.10 per cluster per hour, $73/month, same as AWS EKS and Azure AKS. What a coincidence, surely not a case of price fixing.
- Charge is by k8s pod requests (what you provision, not what you actually use, which makes perfect sense, same as getting a VM).
- Requests default to expensive 0.5 CPU, 100 MiB RAM if not specified
- There’s a minimum for the requests and there’s also a minimum CPU/RAM ratio
- Surprise Cloud Logging and Monitoring charges in the default cluster, GKE Dataplane V2 metrics are exposed in GKE Autopilot.
- GKE admission control quirkiness
- No direct access to control nodes (this is fine for the most part)
- Default 4 AZs: this means 4 nodes using at least 1 CPU each. If a node doesn’t have any workload then there is a “balloon” pod there not doing anything other than using 1 CPU. This balloon pod will be evicted when a workload pod comes in. This is like the joke of two guys walking into each other in the desert, one carrying an armoire and the other carrying an anvil. The conversation goes: “why do you have an armoire?” “Well, if the desert bandits come, I can go hide inside. Why do you have an anvil?” “Well, if the desert bandits come, I drop the anvil and I can run faster”.
- The cheapest GKE autopilot has a minimum of 2 AZs
- The cheapest GKE cluster possible seems to be the one using GKE standard with one master node.
- Using spot instances for worker nodes as a way to reduce costs is possible, but you have to bear a node going down and up randomly. It’s easy to get an alert for this if you want.
- GKE Autopilot quota is not clear, you’ll get “can’t scale up nodes” message and won’t be hitting anything in the GCP quota page; the internal GKE quota is hidden
- Cluster warnings in the dashboard are delayed for about one or two days (you may have fixed it but it doesn’t show up as fixed for a while).
In terms of cost, the initial 4-zone cluster with very light load was close to $1,000 a month.
Then I migrated to a cluster with the minimum 2 AZs plus a worker spot node, using about 3 vCPUs and 3 GB RAM with low traffic costing $5 - $8 / day ($150 to $240 a month). This cost is high compared to a single “big three” VM of similar specs and very high compared to a no-frills cloud provider VM.
Other Cloud Providers
The point of running on one of the big cloud providers is to take advantage of their managed services ecosystem; if all you need is computing (CPU, RAM, disk and bandwidth) then a provider like Hetzner or OVH is way cheaper.
There are many cloud providers that offer a managed k8s service. For example, DigitalOcean, with a similar price as the big three, just a bit cheaper.
Ovhcloud and civo offer the data plane for free, so either offering could be options to explore in my case.
Software to manage K8s
We can use a barebones cheap cloud provider and on top of that using software that takes care of creating and managing the control plane of a k8s cluster.
Some options for deploying and managing Kubernetes clusters are Cloudfleet, syself or Edka.
Independently, we can use Lens as a general k8s dashboard. For full telemetry, Coroot is an option.
Hetzner + Edka
In terms of price, reliability and features, it is hard to beat Hetzner. Hetzner offers servers, volumes, load balancers, public IPs, firewalls and object storage. It does have an API and a Terraform provider.
The Edka k8s managing tool specializes in Hetzner and it’s free for one cluster (and very affordable for more) so using both sounds like a good combination to try a proof of concept.
To reproduce a similar GKE cluster in terms of capacity (not in terms of features) in Hetzner, we provisioned with Edka two servers and the minimal infrastructure you’d be charged for:
- CCX13: 2vCPUs, 8GB RAM for the master node with dedicated (not shared) CPUs. This is the cheapest dedicated CPU offering at €12.49 / month (about $14.66)
- CPX21: 3vCPUs, 4 GB RAM (shared CPU) at €7.55 / month (about $8.86)
- 10 GB Volume € 0.44 / month
- Load balancer at € 5.39 / month (about $6.33)
- 3 public IP addresses (one per server plus the cluster): about $2
So the total is a bit over $30 / month, at least 5x cheaper than the GKE solution for a similar workload capacity.
The set up was pretty painless and straight forward. Anecdotally we’ve also found the Hetzner cluster to be faster and more responsive than the GKE counterpart.
Summary
In summary, depending on your Kubernetes cluster requirements, both in terms of control plane high availability, extra features (unattended upgrades, backups etc) and integration with your cloud provider services, migrating from one of the big cloud providers managed k8s to a combination of cheap VMs plus management software may make sense if reducing costs is an important objective.