Notes on Kubernetes

Kubernetes

  • Kubernetes is a system for running many different types of containers over multiple different machines.
  • K8s is a system of processes that run on multiple machines to provide master and worker architecture to deploy workloads.
  • Master controls what each node does. Gets a config of how much work load of a particular image is needed, and creates them.
  • Nodes can be VM or physical machine. The master creates workload containers on them.
  • Nodes have a container runtime (like docker) running on them. The nodes use this container runtime to create containers.
  • Master + Nodes is called a cluster.
  • SUMMARY:
    • You send a request to the cluster via the Kubernetes API, which is exposed by the Master.
    • The Master has information about the worker machines, or Nodes, in the cluster, and fulfills your request by communicating to the Nodes via the Kubelet.
    • Node Kubelets communicate information back and forth with the Master to make sure your request is fulfilled and maintained across the cluster.

Types of processes:

  • kube-apiserver: Takes the yaml configs and creates the objects.
    • Uses etcd as the distributed key-value database for configs.
    • etcd stores the actual state of the system and the desired state of the system.
    • Data presented in kubectl get <xyz> come from etcd.
    • Changes to config are saved in etcd/
    • When a pod goes down, that value is updated in the etcd.
  • kube-scheduler: Connects to the etcd database using a “watch” which is a pub-sub model.
    • When a new pod has to be deployed, scheduler decides which replica of the node a pod will go in (load balance pod across nodes.)
  • kube-controller-manager: Brain of the operation
    • Control process for actual workers like namespace-controller, deployment-controller, replicaset-controller.
    • Keeps a track of the current state and moves it towards the desired state.
  • kubelet: Lives on every single node and connect to the api server on the master node.
    • Talks to the container run time (eg docker) and does the ACTUAL work of creating a container in a pod.
    • Performs liveness probes.
  • kube-proxy: Talks to the api server and creates the service objects.
  • Notice how the first three are of docker-desktop type here. All the k8s processes are extensible, pluggable etc.
[rritesh-a02:rritesh:~] $ kubectl get pods --namespace=kube-system
	NAME                                     READY   STATUS    RESTARTS   AGE
	coredns-f9fd979d6-ktrdv                  1/1     Running   1          40h
	coredns-f9fd979d6-mvdnl                  1/1     Running   1          40h
	etcd-docker-desktop                      1/1     Running   1          40h
	kube-apiserver-docker-desktop            1/1     Running   1          40h
	kube-controller-manager-docker-desktop   1/1     Running   1          40h
	kube-proxy-4p74z                         1/1     Running   1          40h
	kube-scheduler-docker-desktop            1/1     Running   1          40h
	storage-provisioner                      1/1     Running   2          40h
	vpnkit-controller                        1/1     Running   1          40h

$  kubectl cluster-info
	Kubernetes master is running at https://kubernetes.docker.internal:6443
	KubeDNS is running at https://kubernetes.docker.internal:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

$ kubectl get nodes
	NAME             STATUS   ROLES    AGE   VERSION
	docker-desktop   Ready    master   41h   v1.19.3
  • Load balancer is used to balance and route requests from external parties to the nodes.
		 ----> Node [Containers]
		|						    |		
Master---->   Node [Containers] <--- ---- LoadBalancer <---- Requests	
		|							|
		 ----> Node [Containers]    

Minikube

  • CLI used to set up k8s cluster locally. minikube is used to manage the node VMs.
  • minikube driver can be set to docker or virtual box
$ minikube config set driver docker : Use local machine's docker runtime.
	
[rritesh@rritesh-a02:~] $ kubectl get nodes
		NAME       STATUS   ROLES                  AGE   VERSION
		minikube   Ready    control-plane,master   56s   v1.20.2

[rritesh@rritesh-a02:~] $ kubectl cluster-info
		Kubernetes master is running at https://127.0.0.1:55000
		KubeDNS is running at https://127.0.0.1:55000/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

		
$ minikube config set driver virtualbox : Use virtualbox to install a minikube vm to act as the node.
	
[rritesh@rritesh-a02:~] $ kubectl get nodes
		NAME       STATUS   ROLES                  AGE   VERSION
		minikube   Ready    control-plane,master   15m   v1.20.2

[rritesh@rritesh-a02:~] $ kubectl cluster-info
		Kubernetes master is running at https://192.168.99.100:8443
		KubeDNS is running at https://192.168.99.100:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

kubectl and kubernetes objects

  • Used for managing the containers in the node.
  • Local Development:
    • Install kubectl: CLI for interacting with master
    • Install VM driver like virtual box: Create VMs that will act as node.
    • Install minikube: Create a node on the VMs.
  • Docker compose:
    • Each entry can optionally create an image.
    • Each entry points to a container we want to create.
    • Each entry defines the networking requirements.
  • Kubernetes:
    • Expects all images to be prebuilt. (Doesn’t have a build process. )
    • One config file per object we want to create.
    • Manually set up all networking.
  • kubectl uses 2 Yaml config files to create objects in a k8s cluster.
  • Different objects serve different purposes, eg running a container, monitoring a container, setting up networking etc.
  • Objects types can be:
    • StatefulSet
    • ReplicaController
    • Pod
    • Service
    • Namespace
    • Event
    • EndPoints
    • configMap
    • componentStatus
    • controllerRevision … etc.
  • These objects are running on the nodes.
  • The objects on the node that runs one or more container(s) is called a POD.
  • Unlike docker-compose, we don’t create containers with kubectl, we create objects.
  • The smallest thing kubectl can deploy is a Pod (ie an object with one or more container(s) in it).
  • Multi container pods are used to group together and deploy containers that need to deploy together to work correctly. (Very tightly coupled and tightly integrated containers)
  • Example: A pod has 3 containers: postgres, logger, backup-manager. All 3 need to deploy together.
  • Service type object sets up networking in a k8s cluster.
  • Services is an abstraction which defines a logical set of Pods and a policy by which to access them (sometimes this pattern is called a micro-service).
  • Subtypes:
    • ClusterIP -> Exposes the service on a cluster-internal IP. Choosing this value makes the service only reachable from within the cluster.
    • NodePort -> Expose the service to the outside world.
    • LoadBalancer -> Exposes the service externally using a cloud provider’s load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.
  • Ingress -> Ingress is not a Service type, but it acts as the entry point for your cluster. It lets you consolidate your routing rules into a single resource as it can expose multiple services under the same IP address.

Node

kube-proxy ----> Service NodePort ----> Pod[container]

  • Every node has a program called kube-proxy. This program is the one single window to the outside world for the containers in the Node.
  • The request comes from outside world to kube-proxy, which makes the decision about which Service to route the request to.
  • The Service object, if it is NodePort type, will forward the request to correct port of the container in the Pod object.
[rritesh-a02:rritesh:~/Study/Docker/simplek8s] $ cat client-node-port.yaml
	apiVersion: v1
	kind: Service
	metadata:
	  name: client-node-port
	spec:
	  type: NodePort
	  ports:
	    - port: 3050
	      targetPort: 3000
	      nodePort: 31515
	    selector:
	      component: web

[rritesh-a02:rritesh:~/Study/Docker/simplek8s] $ cat client-pod.yaml
	apiVersion: v1
	kind: Pod
	metadata:
	  name: client-pod
	  labels:
	    component: web
	spec:
	  containers:
	    - name: client
	      image: stephengrider/multi-client
	      ports:
	        - containerPort: 3000
  • K8s uses a label selector system to decide which Pod the request goes to. The Pod’s name is client-pod, with a label component called web. The Pod has a container called client inside it, with port 3000 opened for outside world (nginx service is running in the image provided on port 3000).
  • The Service is a NodePort type. When the service comes up, it sees it has to do port forwarding to target port 3000 of any other object that has the key value pair of component:web in it’s metadata label (eg client-pod)
  • Forwarding happens because Service::spec::selector matches the Pod::metadata::label, and the targetPort matches the containerPort
  • Other PODs in the Node can also connect using port (3050).
  • nodePort (31515) is the port that the outside world will use to connect to the container’s port 3000.
  • nodePort needs to be between 30000-32767.

Deploying

kubectl apply -f client-pod.yaml
kubectl apply -f client-node-port.yaml
  • One container created:
[rritesh-a02:rritesh:~/Study/Docker/simplek8s] $ docker ps
CONTAINER ID   IMAGE                        COMMAND                  CREATED         STATUS         PORTS     NAMES
96f11964e269   stephengrider/multi-client   "nginx -g 'daemon of…"   6 minutes ago   Up 6 minutes             k8s_client_client-pod_default_e7699f38-2a14-4785-a296-77fa0a9ba8ba_0
  • kubectl get status
[rritesh-a02:rritesh:~/Study/Docker/simplek8s] $ kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
client-pod   1/1     Running   0          7m35s

[rritesh-a02:rritesh:~/Study/Docker/simplek8s] $ kubectl get services
NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
client-node-port   NodePort    10.108.38.185   <none>        3050:31515/TCP   82s
kubernetes         ClusterIP   10.96.0.1       <none>        443/TCP          78m
  • The kubernetes clusterIP is internal to the k8s cluster.
  • If the node has been deployed using docker-for-desktop, localhost:31515 should be able to access the POD container functionality.
  • If the container crashes or is killed, k8s will restart that container.
[rritesh-a02:rritesh:~] $ kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
client-pod   1/1     Running   1          8h

[rritesh-a02:rritesh:~] $ docker ps
CONTAINER ID   IMAGE                          COMMAND                  CREATED         STATUS                          PORTS     NAMES
04b461789a79   stephengrider/multi-client     "nginx -g 'daemon of…"   6 minutes ago   Up 6 minutes                              k8s_client_client-pod_default_e7699f38-2a14-4785-a296-77fa0a9ba8ba_1

[rritesh-a02:rritesh:~] $ docker kill 04b461789a79
04b461789a79

[rritesh-a02:rritesh:~] $ kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
client-pod   1/1     Running   2          8h

[rritesh-a02:rritesh:~] $ docker ps
CONTAINER ID   IMAGE                          COMMAND                  CREATED          STATUS                          PORTS     NAMES
aee80d2ee170   stephengrider/multi-client     "nginx -g 'daemon of…"   58 seconds ago   Up 57 seconds                             k8s_client_client-pod_default_e7699f38-2a14-4785-a296-77fa0a9ba8ba_2
  • Kubernetes has a builtin DNS resolver.
    [rritesh-a02:rritesh:~] $ kubectl get pods --namespace=kube-system
      NAME                                     READY   STATUS    RESTARTS   AGE
      coredns-f9fd979d6-ktrdv                  1/1     Running   1          40h
      coredns-f9fd979d6-mvdnl                  1/1     Running   1          40h
      etcd-docker-desktop                      1/1     Running   1          40h
      kube-apiserver-docker-desktop            1/1     Running   1          40h
      kube-controller-manager-docker-desktop   1/1     Running   1          40h
      kube-proxy-4p74z                         1/1     Running   1          40h
      kube-scheduler-docker-desktop            1/1     Running   1          40h
      storage-provisioner                      1/1     Running   2          40h
      vpnkit-controller                        1/1     Running   1          40h
    
  • This DNS resolver helps services talk to each other using just service names.
  • The service FQDNs are like <service-name>.<namespace>.svc.cluster.local
  • Services in the same namespace and connect to each other via just <service-name>
  • Services across namespaces can have the same name, so the <service-name>.<namespace> is needed to resolve say nginx.dev.svc.cluster.local and nginx.prod.svc.cluster.local
  • kubectl describe gets the detailed information about an object.
$ kubectl describe pods client-pod
Name:         client-pod
Namespace:    default
Priority:     0
Node:         docker-desktop/192.168.65.3
Start Time:   Wed, 03 Feb 2021 01:45:13 +0530
Labels:       component=web
Annotations:  <none>
Status:       Running
IP:           10.1.0.21
IPs:
  IP:  10.1.0.21
Containers:
  client:
    Container ID:   docker://1855cdd1bd23645620accacadbc298715a540ba93fd469175e4be8de10be594c
    Image:          stephengrider/multi-client
    Image ID:       docker-pullable://stephengrider/multi-client@sha256:855452509d6d9f13dbe1cd34fa3a21d7f6e7d1f0fafb38d1e715dda8e3d17f46
    Port:           3000/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 22 Feb 2021 10:04:06 +0530
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Sun, 21 Feb 2021 13:05:14 +0530
      Finished:     Mon, 22 Feb 2021 10:03:47 +0530
    Ready:          True
    Restart Count:  4
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9s569 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-9s569:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9s569
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>
  • Deleting objects: kubectl delete using the config file. This is imperative approach, not declarative.
$ kubectl delete -f client-pod.yaml
	pod "client-pod" deleted
  • kubectl apply can update only a small amount of configs of a Pod. eg you cannot change the name of containers, or the ports exposed.
  • Deployment objects are used to ensure the correct number of pods (and with correct config) exists. Pods are directly not used in production generally. Production machines generally use Deployment.
  • Deployment has a pod template configuration that it uses to deploy a set of IDENTICAL pods.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: client-deployment
spec:
  replicas: 1				<---- How many pods do we need
  selector:					<---- How does Deployment lookup the Pods it manages after it creates them.
    matchLabels:
      component: web		<---- All pods matching this label are managed by this depolyment.
  template:  				<---- template section exactly like pod yaml file
    metdata:
      labels:
        component: web
    spec:
      containers:
	        * name: client
          image: stephengrider/multi-client
          ports:
	            * containerPort: 3000
  • Changing deployment may delete and start a new Pod (if non changable fields of a pod are changed):
$ kubectl apply -f client-deployment.yaml
deployment.apps/client-deployment configured

$ kubectl get pods
NAME                                 READY   STATUS        RESTARTS   AGE
client-deployment-7cb6c958f7-wg5p5   1/1     Terminating   0          64s
client-deployment-8b5864968-pc2f8    1/1     Running       0          5s

$ kubectl get deployments
NAME                READY   UP-TO-DATE   AVAILABLE   AGE
client-deployment   1/1     1            1           11s

$ kubectl get pods -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP          NODE             NOMINATED NODE   READINESS GATES
client-deployment-8b5864968-pc2f8   1/1     Running   0          12m   10.1.0.23   docker-desktop   <none>           <none>

Datastore-etcd

  • etcd is the core state store for Kubernetes. While there are important in-memory caches throughout the system, etcd is considered the system of record.
  • The highly consistent nature of etcd provides for strict ordering of writes and allows clients to do atomic updates of a set of values.
  • The idea of watch in etcd is critical for how Kubernetes works. One component can write to etcd and other componenents can immediately react to that change.
  • The common pattern is for clients to mirror a subset of the database in memory and then react to changes of that database.
  • Watches are used as an efficient mechanism to keep that cache up to date.

Policy Layer: API Server

  • This is the only component in the system that talks to etcd.
  • The API Server is a policy component that provides filtered access to etcd.
  • The API Server allows various components to create, read, write, update and watch for changes of resources.
  • Also supports watches. A component could write something to API server resource (REST API), and the update is watched by other components.
  • Responsibilities:
    • Authentication and Authorization: Kubernetes has a pluggable auth system. There are some built in mechanisms for both authentication users and authorizing those users to access resources.
    • Admission controllers: Reject/Modify requests to make sure only valid data is allowed in the system.

Scheduler:

  • Looks for pods that aren’t assigned a node, examines the state of the cluster, finds a node with free space and binds the pod to that node.

Controller Manager:

  • Code that brings current state of the system to the desired state
  • Implement the behavior of ReplicaSet. (ReplicaSet ensures that there are a set number of replicas of a Pod Template running at any one time)
  • Controller will watch both the ReplicaSet resource and a set of Pods based on the selector in that resource.
  • It then takes action to create/destroy Pods in order to maintain a stable set of Pods as described in the ReplicaSet.

Kubelet:

  • Agent that sits on the node.
  • This also authenticates to the API Server like any other component.
  • It is responsible for watching the set of Pods that are bound to its node and making sure those Pods are running.
  • It then reports back status as things change with respect to those Pods.
  • The basic flow:
    • The user creates a Pod via the API Server and the API server writes it to etcd.
    • The scheduler notices an “unbound” Pod and decides which node to run that Pod on. It writes that binding back to the API Server.
    • The Kubelet notices a change in the set of Pods that are bound to its node. It, in turn, runs the container via the container runtime (i.e. Docker).
    • The Kubelet monitors the status of the Pod via the container runtime. As things change, the Kubelet will reflect the current status back to the API Server.

Volumes:

  • In containers, volume is the mechanism to allow a container to access a filesystem outside itself.
  • In kubernetes, volume is an Object that allows a container to store data at the Pod level. The kubernetes volume can be accessed by any container in the Pod.
  • The volume is tied to the Pod. So if the Pod ever dies, the volume is lost as well. The Deployment would recreate the Pod, the volume would also be new.
  • Persistent Volume is like a Volume but have a lifecycle independent of any individual Pod that uses the PV.
  • Persisten Volume Claim is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany or ReadWriteMany)
    • You, as cluster administrator, create a PersistentVolume backed by physical storage. You do not associate the volume with any Pod.
    • You, now taking the role of a developer / cluster user, create a PersistentVolumeClaim that is automatically bound to a suitable PersistentVolume.
    • You create a Pod that uses the above PersistentVolumeClaim for storage.

StorageClass:

  • A StorageClass provides a way for administrators to describe the “classes” of storage they offer.
  • Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies determined by the cluster administrators.
  • Kubernetes itself is unopinionated about what classes represent. This concept is sometimes called “profiles” in other storage systems.
  • The StorageClass can be found out by this command:
$ kubectl get storageclass
	NAME                 PROVISIONER          RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
	hostpath (default)   docker.io/hostpath   Delete          Immediate           false                  16d
  • On different cloud providers, the default StorageClass will be their implementation. For example VMware has VsphereVolume that can provision from VSAN Datastores.

  • Secret: kubectl create secret generic pgpassword --from-literal PGPASSWORD=postgres

  • Nginx ingress controller installation

    $ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.30.0/deploy/static/mandatory.yaml
    $ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.30.0/deploy/static/provider/cloud-generic.yaml
    

Deploy on Ubuntu cluster on virtual box

  • Ubuntu: setup networking
/etc/netplan/01-host-only.yaml
	network:
	  version: 2
	  renderer: networkd
	  ethernets:
	    enp0s8: # this is your interface name for your NAT network
	      dhcp4: no
	      addresses: [192.168.99.20/24]
	      gateway4: 192.168.99.1
	      nameservers:
	        addresses: [192.168.99.1, 8.8.8.8]

$ sudo netplan generate
$ sudo netplan apply
  • Install Kubernetes ```sh curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add sudo apt-add-repository “deb http://apt.kubernetes.io/ kubernetes-xenial main” sudo apt-get install kubeadm kubelet kubectl

root@kubemaster:~# kubeadm config images pull [config/images] Pulled k8s.gcr.io/kube-apiserver:v1.20.4 [config/images] Pulled k8s.gcr.io/kube-controller-manager:v1.20.4 [config/images] Pulled k8s.gcr.io/kube-scheduler:v1.20.4 [config/images] Pulled k8s.gcr.io/kube-proxy:v1.20.4 [config/images] Pulled k8s.gcr.io/pause:3.2 [config/images] Pulled k8s.gcr.io/etcd:3.4.13-0 [config/images] Pulled k8s.gcr.io/coredns:1.7.0

* To start using your cluster, you need to run the following as a regular user:

```sh
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
  • Alternatively, if you are the root user, you can run export KUBECONFIG=/etc/kubernetes/admin.conf
  • After copy of config
	root@kubemaster:~# kubectl cluster-info
	Kubernetes control plane is running at https://192.168.99.20:6443
	KubeDNS is running at https://192.168.99.20:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Written on July 24, 2021