- Master nodes
- Worker/Slave nodes
It is responsible for the management of Kubernetes cluster.
The master node has various components like Controller Manager, Scheduler ,ETCD and API Server.
The API server has a RESTful interface, which means that many different tools and libraries can readily communicate with it.
It is the entrypoint for the tools and libraries to communicate.
It regulates the state of the cluster.
A replication controller ensures that the number of replicas (identical copies) defined for a pod matches the number currently deployed on the cluster.
This can involve scaling an application up or down etc.
The process that actually assigns workloads to nodes in the cluster.
The scheduler will track the resource capacity for the nodes
ETCD is a distributed key-value store. It’s mainly used for shared configuration and service discovery.
The worker node has various components like Kubelet, Kube-proxy, and Pods.
Kubelet gets the configuration of a Pod by interacting with API server and ensures that the desired containers are scheduled and running.
Kube-proxy acts as a network proxy and a load balancer which runs on each worker node and listens to the API server for each Service endpoint creation and deletion. It also involves in setting up routes to reach the service
A pod is one or more containers that logically run together on nodes.
Pro Tip: If you want to work on dev environment you can install minikube If you want for production purpose, there are managed solutions like Amazon Elastic container Service, Azure Kubernetes Service, Google cloud kubernetes Engine. All these are Managed service provide by them.
What is a pod?
A Pod is a grouping of one or more containers that operate together.
Namespace are used to isolate the set of resources that we use in the kubernetes.
Don’t worry if you don’t understand, we will see everything in details as you read below.
Consists a set of configurations that can be used by kubernetes object
Why do we need services in kubernetes
In the scenario below, you can see that there is a service of type NodePort and "app" containers running inside the pods.
When a worker node dies, the Pods running on the Node are also lost and what if the pod dies? The connection will be lost isn’t it.
Therefore you need a service to auto discover when a new pod is created with new ip address.
Service also does load balancing the traffic and service discovery for the pods.
Types of services are ClusterIP, NodePort, LoadBalancer, Ingress.
LoadBalancer is the old way of getting traffic into the network cluster.
ClusterIp will allow any objects inside of our kubernetes cluster to access the object which ClusterIp is pointing at.
It exposes its own set of pods to other objects inside the kubernetes cluster.
You cannot access the clusterIp from external world i.e by browser.
NodePort is the one which exposes the container to the outside world.
Exposes a set of services to the outside world.
You will see examples for each services as you read below.
Deployment vs Pod vs Service
We will see the differentiation with the scenario below.
In Deployment config file, you will specify about apps that you are hosting, containers with some specifications.
Specifically deployment will keep the pods alive and running.
Service object gives a virtual IP (cluster IP) for the pods that have a matching label that has been deployed with deployment object.
You need to have the service object because the pods from the deployment object can be killed, scaled up and down, and you can't rely on their IP addresses because they will not be persistent.
So you need an object like a service, that gives those pods a stable IP.
Pod and Service Config File
The configFile that we wrote below are used to create objects.
There are many object types like Pod, Service, ReplicaController, Deployment, ReplicaSet etc.
These object will serve purposes like running a container, creating a networking, replicating the container etc.
Now with this below yaml file, If I issue the command
kubectl apply –f <name-of-the-file>
It will create a pod and service object.
Config File Breakdown
In the above file , I have defined a pod and a service both in a single yaml file.
In line 1, apiVersion defines a different set of objects we can use. There is also one more apiVersion as “apps/v1” that will show another predefined set of objects that we can use.
In line 2, kind of object that we will create is Pod. When we load up the configFile in kubectl, It will create a object Pod inside a Node. Pod is nothing but grouping of containers.
In line 15, service will create a networking in kubernetes. In here we have listed service object spec type as NodePort.
This NodePort will expose the container to outside world. By this you can access the running container in your web browser. There are other types such as ClusterIP, LoadBalancer and Ingress.
In line 25, Selector property is defined with key-value as “name: nginx”. This selector property inside a service will search for a container with a labels as “name: nginx”. Once after finding , it will redirect the traffic. That is the reason we have defined the labels “name: nginx” in line 6.
In line 21, The port property means that if other pod needs to connect with our nginx it can connect it through 8080.
In line 22, targetPort means incoming request will be redirected to the Pod in which container port is 80.
In line 23, nodePort is the one which exposes the container to the outside world. In the browser you can provide http://hostname:<nodePort-Ip> to access the container which you are running. If you don’t assign the nodePort it will be randomly assigned.
After that feed the config file to the kubectl by using the below command
kubectl apply –f <file-name>
ReplicaSet Vs Replication Controller vs Deployments
As you can see, Deployment is advanced than other two we are going to concentrate only on deployment.
Deployment is a kubernetes object in which it runs a set of identical pods i.e one or more pods.
This can be used in production. A Deployment is a best way to handle High Availability (HA) when compared with ReplicaSets and ReplicationControllers.
Below is an example for deployment yaml file.
Breakdown of Deployment config
In line 2, the api version is "apps/v1"
In line 3, the object type is deployment
In line 10, replicas are the number of pods that this deployment is supposed to create.
In line 11, there is a template property contains the configurations in which it will be used for every single Pod that is created by the Deployment object. This template section is used for creating pods.
In line 12,13,14 the every pod that is created by the deployment will have label of “app: nginx”. To sum up, the template section is nothing but Pod template.
kubectl get pods – To get the list of running pods
kubectl gets pods –o wide – to get the ip address of a pod. For every pod created, there is a ip address assigned internally.
kubectl get services – To get list of running services
kubectl describe <object-type> - Get detailed information about an object
kubectl delete –f <config-file> - this command deletes the running object by passing the config file that created this object.
In kubernetes it is suggested to opt for Persistent Volume rather than Volumes.
The reason is persistent volume is not tied to any specific pod and will not get deleted when the pod dies.
Persistent volumes is separated from the Pod.
In order to use these PVs user needs to create PersistentVolumeClaims which is nothing but a request for Persistent Volumes.
A claim must specify the access mode and storage capacity etc.
once a claim is created, PV is automatically assigned(bound) to this claim.
There are two ways Persistent Volumes may be provisioned
statically or dynamically. You will see this below.
Difference between Persistent Volume claim and Persistent Volumes
In the above diagram, StorageClasses use provisioners that are specific to the storage platform or cloud provider to give Kubernetes access to the physical media(Hard disk) being used.
PersistentVolumeClaim(PVC) will be attached to a Pod config and kubernetes sees that PVC and will find whether the statically provisioned persistent volume is available or dynamically provisioned persistent volume is available to fulfill the requirements of the claim.
If you have a default Storage Class or you specify which storage class to use when creating a PVC, PV creation is automatic.
There are two ways Persistent Volumes may be provisioned: statically or dynamically.
When you're running minikube on your local machine, there is a default StorageClass set up.
This will dynamically provision new storage by allocating a portion of your hard drive.
On cloud providers, there are many more options for storage - that is where we can optionally define new storage classes.
When none of the static PVs the administrator created matches a user’s PersistentVolumeClaim, the cluster may try to dynamically provision a volume specially for the PVC. This provisioning is based on StorageClasses.
Below is the sample flow of dynamic provisioning
Sample flow for dynamic provisioning of file storage with the predefined standard storage class
In 1st flow, You can see that “claimName is pvc” and this claim will refer the 2nd flow.
In 2nd flow there is acessModes. Access mode defines how a pod consumes this volume
- • ReadWriteOnce – Mount a volume as read-write by a single node
- • ReadOnlyMany – Mount the volume as read-only by many nodes
- • ReadWriteMany – Mount the volume as read-write by many nodes
In 2nd flow, the storageClassName is "standard".
StorageClass allows for dynamic provisioning of PersistentVolumes.
If you do not specify this parameter it will take the default storage class that is in kubernetes cluster.
This default StorageClass is then used to dynamically provision storage for PersistentVolumeClaims that do not require any specific storage class.
You can List the StorageClasses in your cluster by below command
kubectl get storageclass
In 3rd flow, you can see that StorageClass contains the fields like provisioner (aka volume-plugin), parameters, and reclaimPolicy, which are used when a PersistentVolume belonging to the class has to be dynamically provisioned.
In 3rd flow, there is a type as gp2 which refers to general purpose 2 storage class in AWS.
A cluster admin creates a number of PVs.
They are ready made partitions of real storage which is available for use by cluster users.
Sample flow for static provisioning of file storage with the predefined standard storage class
In static way we are creating a PV manually and attach PVC to it, skipping the storage classes.
As you can see that First Pv’s are created and there is no storage class defined in the PVC. Now you can use that PVC in Pod config or deployment.
Once the deployment is up all the contents on container directory will be stored on persistent volume claim.
Secrets in kubernetes
Secrets are used for storing the passwords like ssh keys, certificates, database password etc.
The command to create secret is
kubectl create secret <type-of-secret> <secret-name> --from-literal key=value
kubectl create secret generic mySecret --from-literal PASSWORD=foobar123
There are different types of secret like “tls”, “docker-registry”, “generic”.
- docker-registry - Create a secret for use with a Docker registry
- generic - Create a secret from a local file, directory or literal value
- tls - Create a TLS secret for https
We need a role based access system in order to limit the permission by defining who can access the objects in the cluster. This can be done by RBAC.
To administer a cluster an user can be given with some permissions.
The permissions are given to a set of pods so that they can talk to other objects in kubernetes cluster
An account or resources can be provided with set of permissions in a single namespace
An account or cluster scoped resources (node) can be provided with set of permissions and across the entire cluster.
In the below command, we are creating a new service account with name as “foobar” in the default namespace.
kubectl create serviceaccount --namespace default foobar
In the below command, we are creating a new clusterrolebinding name as “foobar-cluster-role” and the clusterrole should be “cluster-admin” and the serviceaccount name as we created above as <namespace>:<serviceaccount-name> i.e “default:foobar”.
kubectl create clusterrolebinding foobar-cluster-role –clusterrole=cluster-admin –serviceaccount=default:foobar
Ingress in Depth
Ingress is an object that allows access to your Kubernetes services from outside the Kubernetes cluster. There is also other options that expose services to the external world they are NodePort, LoadBalancer and Ingress. But Ingress is far better than the other two.
As seen below diagram, When the Ingress config file is created with routing rules defined and feeded into kubectl, It invokes ingress controller behind the scenes to create a routing mechanism to accept the incoming traffic and routes the traffic to the services. This is the overall picture of ingress
Kubernetes Ingress vs LoadBalancer vs NodePort
These three will expose services in kubernetes to the external world.
In NodePort, Drawbacks is that you need to allocate port and you will be dealing with port management and it is not a robust solution.
In LoadBalancer, Drawbacks is that if you can set a service to be of type LoadBalancer it will create a Network Load Balancer with an IP address that you can use to access your service. Everytime if you want to expose a service, a new load balancer should be created with ip address assigned.
These are the reasons why one should opt for Ingress instead.
Ingress involves creating a Ingress controller and routing rules defines in it.
What is Helm?
It is a program to administer the third party system inside the kubernetes cluster.
When we install helm you will get two parts i.e “helm” and “tiller”. Helm is a client and tiller is a server.
You can simply think that helm makes it easy to install applications and resources into Kubernetes clusters.
Tiller will make those changes or modify changes in the kubernetes cluster, Generally you can think as a package manager.
Taints and Tolerations
Taints and tolerations allow the node to control which pods should (or should not) be scheduled on them.
You can imagine taint as “label to be applied on Node” and You can imagine toleration as “label to be applied on Pod”
If these two labels (Node lables, Pod labels) matches, then Node will allow pod to be scheduled. If it doesn’t match then it will refuse scheduling on that particular node.
For Example , If I have 3 nodes named as A, B , C and one pod. If I want this pod to be scheduled only on Node A, Then I need to apply taint through NodeSpec and also I need to apply tolerations to that Pod through PodSpec.
Taints and tolerations consist of a key, value, and effect.
kubectl taint nodes <node-name> <node-label>:<effect>
If you want to find the taint at a node , use the below command to find it.
kubectl describe nodes <your-node-name> | grep Taint
Below command applies a taint at a node.
kubectl taint nodes <your-node-name> node-role.kubernetes.io/master:NoSchedule
To get pods to be scheduled to specific nodes Kubernetes provides nodeAffinity.
With node affinity we can instruct Kubernetes in which nodes a pod should be scheduled using the labels on each node.
Docker Swarm stack file vs Kubernetes Configuration yaml file
Below is the docker swarm stack file which is created for jhipster-elasticsearch service and with external NFS volume with constraint to deploy in worker node only and user property added as “81226” to access the NFS server with this UID.
Below is the example for Kubernetes yaml file with Deployment object and service object created.
Config Breakdown for Kube yaml file
In line 15, tolerations added just to deploy on worker nodes only (Just for example). In default, master node is not allowed to schedule any pod on it because the taint is predefined on master node when kubernetes is setup.
In line 19, securityContext is added to access the NFS as this particular UID
In Line 29,30 Command and args are added to check whether it is accessing the NFS server by writing the date every 5s to the particular folder.