Kubernetes cluster on OpenStack - Part 1

Note/Update: This blog series never continued and is outdated.

The last few weeks I'm working with Kubernetes and OpenStack. It's a steep learning curve to get a production-ready Kubernetes Cluster running on OpenStack, especially because I didn't want to use the available ready-to-use tools. In the next few blog posts, I want to share my experience how to run Kubernetes on an OpenStack platform.

In this first blog post, I will discuss the infrastructure and how I use the OpenStack platform to run a production-ready Kubernetes cluster.

Kubernetes

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.
It groups containers that make up an application into logical units for easy management and discovery. Kubernetes builds upon 15 years of experience of running production workloads at Google, combined with best-of-breed ideas and practices from the community.

Kubernetes consists of master nodes and workers nodes. On those nodes, Docker containers can be scheduled by sending Kubernets configurations, like deployments, to the Kubernetes API Server. The Kubernetes scheduler and controller manager will use the configuration to get the Kubernetes platform to the desired state.

To get a production-ready environment, you need at least three master nodes and two worker nodes. Kubernetes will ensure high availability and the desired state using the built-in components (controllers). Besides the compute nodes, you need also networking and storage. Kubernetes uses Load Balancers and Ingress endpoints to route network traffic to pods and leverages the OpenStack API to achieve this. Because Docker containers are ephemeral by design, you probably want also to have persistent storage. There are a ton of storage providers are built-in into Kubernetes, including the OpenStack Cinder provider.

OpenStack

OpenStack software controls large pools of compute, storage, and networking resources throughout a datacenter, managed through a dashboard or via the OpenStack API. OpenStack works with popular enterprise and open source technologies making it ideal for heterogeneous infrastructure.

I use an already existing OpenStack platform to deploy the Kubernetes platform. Because I live in The Netherlands, I've chosen CloudVPS as my Infrastructure As A Service (IAAS) provider. They support all we need: nova compute nodes, neutron networking (private networking, routers and load balancing) and cinder storage.

To use the OpenStack API, you need to download the OpenStack RC file (or fill in all the details yourself) from the OpenStack Horizon Web UI.

Infrastructure

When you don't use any other tooling, you have to figure everything out yourself, including the infrastructure. You have a few options in my opinion:

Single Node Cluster: One master, which is also a worker node
Non-HA Cluster: One mater and one or more worker nodes
HA Cluster: Minimal three master nodes and two or more worker nodes

All Kubernetes components are stateless, but this doesn't mean that you don't need any storage mechanism. Kubernetes stores everything in a key-value store called Etcd. You can decide to create dedicated Etcd servers, or to install them on the master nodes.

You pods need storage too. Kubernetes integrates with the OpenStack volume API (cinder) to automatically provision volumes. Besides storage, Kubernetes will also leverage the OpenStack network API (neutron) to automatically configure load balancers. In the future, Kubernetes auto-scaler will also leverage the OpenStack compute API (nova) to automatically scale your worker nodes. This is still a work in progress though, but can already be used when your using the Kubernetes OpenStack Heat templates (which are deprecated since Kubernetes 1.8).

Add-ons

To get a fully functional and production-ready Kubernetes cluster, you need some add-ons:

Kube proxy
Kube DNS
Network component (Flannel)
Ingress controller (Nginx)

These add-ons (except the ingress controller) are pods running in the kube-system namespace and running on all (master and worker) Kubernetes nodes. The ingress controller will run one the worker nodes, or maybe better, the infrastructure nodes? We didn't discussed infrastructure nodes yet, because Kubernetes itself doesn't have much infrastructure services to justify dedicated infrastructure nodes. In my case, I run the ingress controller on the worker nodes, but this will only work when you're using a load balancer service to automatically create and maintain the load balancer in OpenStack. Manually creating and maintaining the load balancer in OpenStack can be hard. If you can't use the load balancer integration, I recommend you to use PodAffinity to schedule these ingress controllers (nginx pods) on specific worker nodes, and confgure the load balancer manually using the OpenStack API (or Horizon web UI if the load balancer plugin is enabled).

Technical details

We configured the following OpenStack resources:

Private network: 10.0.0.0/16
Private router
- Connected to the private network
- Configured the network containing floating IP addresses as gateway (called "floating" in our case)
Compute nodes
- Master nodes (1 t/m 3)
  - 3 CPU cores
  - 4 GB memory
  - 40 GB disk
  - 25 GB docker volume (mounted at /var/lib/docker)
- Worker nodes (1 t/m 3 ... or more)
  - 6 CPU cores
  - 16 GB memory
  - 40 GB disk
  - 50 GB docker volume (mounted at /var/lib/docker)
- Bastion node (used to connect to the private network)
  - Connected to the external IPv4 network
  - Connected to the private network

On those compute nodes we installed Kubernetes and dependencies:

Master nodes
- CNI
- Etcd
- Docker
- Kubelet (manifests)
  - API server
  - Controller manager
  - Scheduler manager
  - Proxy
Worker nodes
- CNI
- Docker
- Kubelet (manifests)
  - Proxy

And we deployed some pods:

Flannel network
Nginx ingress controller

More to come ...

In this blog post I only wanted to discuss the infrastructure decisions. In the next blog post I will install the Etcd nodes and Kubernetes master nodes.