Kubernetes (K8s)

1. Why Kubernetes? The Problem Docker Solves… and Doesn’t

Problem Docker Kubernetes
Run app in container Yes Yes
Run 100 containers Manual Automated
Auto-restart failed container No Yes
Scale to 1000 containers No Yes
Rolling updates No Yes
Self-healing No Yes
Multi-host deployment No Yes

Docker = "Run one container"
Kubernetes = "Orchestrate 10,000 containers across 100 machines"

2. What Kubernetes Offers on Top of Docker

Feature What It Does
Orchestration Manages 1000s of containers across nodes
Self-healing Auto-restart, reschedule failed pods
Auto-scaling Scale up/down based on CPU/load
Rolling Updates Zero-downtime deployments
Service Discovery api.service → auto DNS
Load Balancing Spread traffic across pods
Secret/Config Management Inject env vars, files securely
Multi-cloud Run same app on AWS, GCP, Azure, on-prem

3. Kubernetes Architecture – Master vs Worker Nodes

+------------------+     gRPC/HTTP     +------------------+
|   MASTER NODE    | ◄───────────────► |   WORKER NODE    |
| (Control Plane)  |                   | (Runs Pods)      |
+------------------+                   +------------------+

MASTER NODE (Control Plane) – The Brain of K8s

Runs on 1 or 3+ nodes (HA)
Never runs user workloads
All components talk via kube-apiserver

+------------------+
|   MASTER NODE    |
|                  |
|  ┌─────────────┐ |
|  │ API Server  │ ← All communication
|  └─────▲───────┘ |
|        │         |
|  ┌─────▼───────┐ |
|  │ etcd        │ ← Single source of truth
|  └─────▲───────┘ |
|        │         |
|  ┌─────▼───────┐ |
|  │ Scheduler   │ ← "Where to run?"
|  └─────▲───────┘ |
|        │         |
|  ┌─────▼───────┐ |
|  │ Controller  │ ← "Make it match desired state"
|  │ Manager     │
|  └─────────────┘ |
+------------------+

1. kube-apiserver – The Front Door

Role Details
Central API All kubectl, controllers, kubelet → talk to this
REST API GET /api/v1/pods, POST /api/v1/namespaces
Authentication JWT, certificates, OIDC, webhook
Authorization RBAC, ABAC, Node, Webhook
Validation Rejects invalid YAML
Scaling Horizontal (multiple replicas behind LB)
# You talk to this
kubectl get pods --server=https://master:6443

2. etcd – The Database (Single Source of Truth)

Role Details
Key-value store Only stores cluster state (pods, services, secrets)
Consistent & HA Uses Raft consensus
Watched by all Controllers react to changes
Backup critical etcdctl snapshot save
# See raw data
kubectl exec -n kube-system etcd-master -- etcdctl get /registry/pods/default/myapp

If etcd dies → cluster is brain-dead
Always 3-node etcd cluster in production

3. kube-scheduler – The Matchmaker

Role Details
Watches Unscheduled pods (nodeName: null)
Scores nodes CPU, memory, taints, affinity, topology
Assigns Sets pod.spec.nodeName

Scoring Example

# Pod wants SSD
nodeSelector:
  disktype: ssd

→ Scheduler picks node with label disktype=ssd

4. kube-controller-manager – The Robot Army

Runs multiple controllers in one process:

Controller Job
ReplicaSet Ensure 3 pods → if 2, create 1
Deployment Manage rollouts, rollback
StatefulSet Ordered pods (db-0, db-1)
DaemonSet Run on every node (logging, monitoring)
Job/CronJob Run to completion
Node Mark node NotReady if kubelet stops
Endpoint Update Service → Pod IP mapping
# See controllers in action
kubectl get rs,deployments,statefulsets -A

5. cloud-controller-manager

Role Cloud Integration
Node Sync cloud node metadata
LoadBalancer Create AWS ELB, GCP LB
Route Cloud network routes
Service Manage cloud-specific services

Only runs in cloud environments

WORKER NODE – The Muscle

Runs user workloads (pods)
Multiple per cluster

+------------------+
|   WORKER NODE    |
|                  |
|  ┌─────────────┐ |
|  │ kubelet     │ ← Talks to API server
|  └─────▲───────┘ |
|        │         |
|  ┌─────▼───────┐ |
|  │ kube-proxy  │ ← Load balances
|  └─────▲───────┘ |
|        │         |
|  ┌─────▼───────┐ |
|  │ containerd  │ ← Runs containers
|  └─────▲───────┘ |
|        │         |
|  ┌─────▼───────┐ |
|  │ Pods        │ ← Your apps
|  └─────────────┘ |
+------------------+

1. kubelet – The Node Agent

Role Details
Watches API server Gets assigned pods
Talks to container runtime Starts/stops containers
Reports status CPU, memory, pod phase
Exec, logs, port-forward kubectl exec, logs
cAdvisor Built-in metrics
# See what kubelet sees
journalctl -u kubelet

2. kube-proxy – The Network Cop

Role Details
Watches Services & Endpoints When pod IP changes
Programs iptables / IPVS Routes traffic
Load balances Round-robin across pods

Service Types Handled

type: ClusterIP  → 10.96.0.1 → iptables DNAT
type: NodePort   → 30080 → iptables
type: LoadBalancer → cloud LB

3. Container Runtime – The Engine

Runtime Status
containerd Default since K8s 1.24
CRI-O Red Hat, lightweight
Docker Deprecated (shim removed)

Docker still works via dockershimcontainerd

# Check runtime
kubectl get nodes -o wide
# → container-runtime: containerd://1.7.0

Real-World Flow

graph TD
    A[User: kubectl apply] --> B[API Server]
    B --> C[etcd: store desired state]
    C --> D[Scheduler: pick node]
    D --> E[kubelet on node]
    E --> F[containerd: pull image]
    F --> G[Start containers]
    G --> H[kube-proxy: update iptables]
    H --> I[Service ready]

High Availability (HA) Setup

Component HA Strategy
API Server 3+ replicas → LB (keepalived, cloud LB)
etcd 3-node cluster (Raft)
Scheduler / Controller Run on all masters (leader election)
Worker Nodes 3+ for redundancy

Summary Table

Node Component Job
Master kube-apiserver API gateway
etcd Cluster database
scheduler Assign pods to nodes
controller-manager Run control loops
Worker kubelet Run pods on node
kube-proxy Network proxy
containerd Run containers

Golden Rule:

Master = Think, Store, Schedule
Worker = Run, Report, Route

Now you understand how Kubernetes turns 100 machines into one logical supercomputer.
Try:

kubectl get componentstatuses
kubectl -n kube-system get pods

And see the control plane in action!

Kubernetes Resources

Pods, Deployments, Services, DaemonSets, Secrets, ConfigMaps, StatefulSets & More

Kubernetes (K8s) resources are the declarative building blocks of your cluster. You define desired state in YAML/JSON, and K8s makes it reality through controllers and reconciliation loops.

Key Principle:

Imperative (kubectl run) → temporary
Declarative (YAML) → persistent, version-controlled

1. Pod – The Atomic Unit

Definition

Key Features

Kubernetes Probes

Kubernetes probes are health checks that determine pod behavior. They use HTTP, TCP, or command-based tests with configurable thresholds (e.g., initial delay, period, timeout, success/failure counts).

  1. Liveness Probe

    • Purpose: Detects if the pod is alive and healthy. If it fails, Kubernetes restarts the pod (self-healing).
    • When Used: For apps that can deadlock or crash (e.g., memory leaks).
    • Behavior: Failure → pod restarts; doesn't affect traffic routing.
    • YAML Example:

    yaml livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 # Restart after 3 failures

  2. Readiness Probe

    • Purpose: Detects if the pod is ready to serve traffic. If it fails, Kubernetes removes it from Service endpoints (no traffic sent) but doesn't restart.
    • When Used: For apps that need warmup time or become temporarily unhealthy (e.g., during DB connection).
    • Behavior: Failure → pod excluded from load balancing; restarts only if liveness fails.
    • YAML Example: yaml readinessProbe: tcpSocket: port: 8080 initialDelaySeconds: 5 periodSeconds: 10 timeoutSeconds: 1 successThreshold: 1 failureThreshold: 3

Key Differences

Aspect Liveness Probe Readiness Probe
Failure Action Restart pod Exclude from traffic
Impact on Service No (traffic continues to healthy pods) Yes (removes from endpoints)
Default None None
Use Case Crash recovery Traffic routing

YAML Example

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels:
    app: web
spec:
  containers:
  - name: nginx
    image: nginx:1.25
    ports:
    - containerPort: 80
    resources:
      limits:
        cpu: "100m"
        memory: "128Mi"
  - name: sidecar-logger
    image: fluentd:v1.14
    volumeMounts:
    - name: logs
      mountPath: /var/log/nginx
  volumes:
  - name: logs
    emptyDir: {}

Use Cases

Pros/Cons

2. Deployment – The Workhorse for Stateless Apps

Definition

Key Features

YAML Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%     # Extra pods during update
      maxUnavailable: 25%  # Allowed downtime
  template:  # Pod template
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx
        image: nginx:1.25
        ports:
        - containerPort: 80
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10

Use Cases

Pros/Cons

Rolling Updates

Rolling update is a zero-downtime deployment strategy in Kubernetes that gradually replaces old pods with new ones in a Deployment or StatefulSet. It ensures service availability by maintaining the desired number of replicas during updates (e.g., image version change), avoiding full outages.

Strategies

Kubernetes supports 2 strategies in Deployment/StatefulSet .spec.strategy.type:

Strategy Description Use Case
RollingUpdate (default) Gradually scales down old pods while scaling up new ones, maintaining availability. Production apps needing zero-downtime.
Recreate Kills all old pods first, then creates new ones. Simple apps where downtime is acceptable (e.g., batch jobs).

RollingUpdate Parameters

Monitor: kubectl rollout status deployment/web
Rollback: kubectl rollout undo deployment/web

3. Service – Load Balancer & Service Discovery

Definition

Types

Type Description Use Case
ClusterIP (default) Internal IP (10.96.x.x) Internal services
NodePort Exposes on node port (30000-32767) Basic external access
LoadBalancer Cloud LB (AWS ELB) Production external
ExternalName CNAME to external service Integrate with legacy

YAML Example

apiVersion: v1
kind: Service
metadata:
  name: web-service
spec:
  selector:
    app: web  # Matches deployment labels
  ports:
    - protocol: TCP
      port: 80      # Service port
      targetPort: 80  # Pod port
  type: LoadBalancer

Use Cases

Pros/Cons

How Services Identify Pods/Deployments

Services discover and route traffic to Pods using label selectors in the Service spec. They don't directly reference Deployments but match Pods created by Deployments/StatefulSets via shared labels.


# Service (matches via selector) apiVersion: v1 kind: Service metadata: name: web-service spec: selector: app: web # Matches Pod labels ports: - port: 80 `` - **Discovery**: Pods atweb-service:80(DNS:web-service.default.svc.cluster.local`).

containerPort vs targetPort

Field Location Purpose Example
containerPort Pod template (Deployment) Container's listening port (info only) containerPort: 8080 (app binds to 8080)
targetPort Service spec Pod port for incoming traffic targetPort: 8080 (Service sends to Pod:8080)

Other Critical Fields for Service Discovery & Health Checks

Ensure seamless discovery (stable endpoints) and health (traffic routing) with these fields:

Field Location Purpose Best Practice
selector Service spec Matches Pod labels for discovery Use unique labels (e.g., app: web, tier: frontend).
labels Pod template (Deployment) Enables selector matching Consistent across Deployment/Service (e.g., app: web).
readinessProbe Pod template Checks if Pod is ready for traffic; failure removes from endpoints HTTP/TCP/exec probe; e.g., initialDelaySeconds: 30 for warmup.
livenessProbe Pod template Checks if Pod is alive; failure restarts Pod (affects discovery indirectly) Less frequent than readiness; e.g., periodSeconds: 60.
port Service spec Service's listening port (e.g., for DNS) Match app needs; use name: http for multiple ports.
protocol Service/Port spec Traffic protocol (TCP/UDP/SCTP) TCP default; UDP for streaming.
sessionAffinity Service spec Sticky sessions (client IP-based) ClientIP for stateful apps; timeout configurable.
publishNotReadyAddresses Service spec Include unready Pods in endpoints true for pre-warmup traffic (rare).
annotations Service metadata Metadata (e.g., for Ingress controllers) e.g., nginx.ingress.kubernetes.io/rewrite-target: /.

YAML Snippet (Full Example)

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      - name: app
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          periodSeconds: 30

apiVersion: v1
kind: Service
spec:
  selector:
    app: myapp
  ports:
  - name: http
    protocol: TCP
    port: 80
    targetPort: 8080
  sessionAffinity: ClientIP
  publishNotReadyAddresses: false

Summary: Selectors enable discovery; probes ensure health; tune ports/probes for reliability. Misconfigured selectors cause "no endpoints" errors.

4. DaemonSet – Run on Every Node

Definition

Key Features

YAML Example

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-logging
spec:
  selector:
    matchLabels:
      name: fluentd
  template:
    metadata:
      labels:
        name: fluentd
    spec:
      containers:
      - name: fluentd
        image: fluent/fluentd:v1.14
        volumeMounts:
        - name: varlog
          mountPath: /var/log
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      tolerations:  # Run on tainted nodes
      - operator: Exists

Use Cases

Pros/Cons

5. Secrets – Secure Data Management

Definition

Key Features

Security Aspects of Secrets

Kubernetes Secrets store sensitive data (e.g., passwords, API keys, tokens) as base64-encoded strings in etcd (the cluster's key-value store). Key security features:

Mounts

Mounting injects data into Pods as environment variables (env) or volumes (volumeMounts). Volumes are preferred for files; env vars for simple values. Defined in Deployment's Pod template (.spec.template.spec).

NOTE: When mounting Secrets and ConfigMaps as volumes (volumeMounts in Pod spec), updates propagate automatically without Pod restart. The mounted files (e.g., /etc/secrets/token) are symlinks to etcd, so changes in the Secret/ConfigMap update the files in-place.

YAML Example

apiVersion: v1
kind: Secret
metadata:
  name: db-secret
type: Opaque
data:
  username: YXBwdXNlcg==  # base64: "appuser"
  password: U3VwZXJTZWNyZXQxMjM=  # base64: "SuperSecret123"

apiVersion: v1
kind: Pod
metadata:
  name: secret-pod
spec:
  containers:
  - name: app
    image: myapp
    env:
    - name: DB_USER
      valueFrom:
        secretKeyRef:
          name: db-secret
          key: username
    volumeMounts:
    - name: secret-volume
      mountPath: /etc/secrets
  volumes:
  - name: secret-volume
    secret:
      secretName: db-secret

Use Cases

Pros/Cons

6. ConfigMap – Non-Sensitive Configuration

Definition

Key Features

Mounts

YAML Example

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  database_url: "postgres://localhost:5432/myapp"
  log_level: "INFO"
  app_name: "MyApp v1.0"

apiVersion: v1
kind: Pod
metadata:
  name: config-pod
spec:
  containers:
  - name: app
    image: myapp
    env:
    - name: DB_URL
      valueFrom:
        configMapKeyRef:
          name: app-config
          key: database_url
    volumeMounts:
    - name: config-volume
      mountPath: /etc/config
  volumes:
  - name: config-volume
    configMap:
      name: app-config

Use Cases

Pros/Cons

7. StatefulSet – For Stateful Apps

Definition

Stateful Workload

A stateful workload is an application or service that maintains persistent state (data, configuration, or identity) across restarts, updates, or failures. It requires stable, ordered, and persistent storage to function correctly, unlike stateless workloads where instances are interchangeable.

Key Characteristics - Persistent Data: Relies on durable storage (e.g., databases with user records). - Stable Identity: Needs consistent naming/ordering (e.g., db-0, db-1). - Ordered Operations: Scaling/updates must follow sequence (e.g., primary replica before secondary).

Examples - Stateful: Databases (MySQL, MongoDB), message queues (Kafka), clustered apps (ZooKeeper). - Stateless: Web servers (Nginx), APIs (FastAPI), simple microservices (no local data).

Why It Matters in Kubernetes

Summary: Stateful = "remembers who it is and what it knows" (e.g., your bank account balance). Use for data-heavy apps; stateless = "doesn't care" (e.g., a calculator).

Key Features

YAML Example

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: "mysql-headless"
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: "password"
        volumeMounts:
        - name: mysql-storage
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: mysql-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
spec:
  clusterIP: None  # Headless
  selector:
    app: mysql
  ports:
  - port: 3306

Use Cases

Pros/Cons

8. Other Key Resources

ReplicaSet

Job & CronJob

Resource Relationships

User (YAML) → API Server → etcd
                    ↓
            Controller Loop
                    ↓
Deployment → ReplicaSet → Pod → Container
                    ↓
                  Service → Load Balance

Summary Table

Resource Use Case Key Feature
Pod Basic unit 1+ containers
Deployment Stateless apps Rolling updates
Service Exposure Load balancing
DaemonSet Node agents Per-node pods
Secrets Sensitive data Encrypted env/files
ConfigMap Config Dynamic injection
StatefulSet Databases Ordered, stable

Golden Rule:

Declarative YAML + Controllers = Self-healing cluster
Define desired state → K8s makes it real.

Now deploy a Deployment + Service and watch K8s orchestrate!

Kubernetes Autoscalers: HPA vs VPA

Kubernetes autoscalers dynamically adjust resources based on workload demands. HPA scales horizontally (more/fewer pods), while VPA scales vertically (CPU/memory allocation). Neither attaches to Services (Services route traffic to existing pods); they target Deployments, StatefulSets, or ReplicaSets (for HPA) or Pods (for VPA).

Horizontal Pod Autoscaler (HPA)

HPA automatically scales the number of pods in a target resource (e.g., Deployment) based on observed metrics like CPU utilization, memory, or custom metrics (via Metrics Server or Prometheus Adapter).

Attachment

YAML Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web  # Attaches to Deployment "web"
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50  # Scale at 50% CPU
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 500Mi  # Scale at 500Mi memory

Commands

When to Use

Important Details

How HPA Works with StatefulSets

  1. Scaling Up:
  2. HPA increases replicas (e.g., from 3 to 5).
  3. StatefulSet controller creates new Pods in order (e.g., web-3, then web-4).
  4. Each new Pod gets a stable hostname (web-3.<statefulset-name>.<namespace>.svc.cluster.local) and attaches to its corresponding PersistentVolumeClaim (PVC) (e.g., data-web-3).
  5. Pods join the cluster (e.g., as replicas in a database like etcd).

  6. Scaling Down:

    • HPA decreases replicas (e.g., from 5 to 3).
    • StatefulSet controller deletes highest-indexed Pods first (e.g., web-4, then web-3).
    • Deleted Pods are terminated gracefully (with termination grace period, default 30s).

    • Data Persistence on Downscale

      • Yes, data persists: StatefulSets bind PVCs to Pod identities (ordinal index).
      • When scaling down, only the Pod is deleted; the PVC (and its bound PersistentVolume/PV) remains.
      • Example: Scaling from 3 to 2 deletes web-2; data-web-2 PVC persists.
      • Re-attach on Scale-Up: If scaled back to 3, web-2 re-creates and re-mounts data-web-2 PVC, preserving data.
      • No Data Loss: Unlike Deployments (ephemeral storage), StatefulSets ensure ordered persistence.
  7. Metrics & Triggers:

  8. Same as Deployments: Monitors CPU/memory/custom metrics.
  9. HPA calculates: desiredReplicas = ceil[currentReplicas × (currentMetricValue / desiredMetricValue)].
  10. Cooldown: 5min default between scales.

Nuances & Considerations for StatefulSets

Best Practice: Combine with PDBs (kubectl create pdb web-pdb --min-available=2) to prevent too many simultaneous downscales.

Summary: HPA scales StatefulSets like Deployments but preserves data via PVCs and ordered identities. Use for elastic stateful apps (e.g., Kafka replicas); monitor for state consistency.

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts Pod resource requests/limits (CPU/memory) based on historical usage, recommending or enforcing changes. It performs vertical scaling (resizing existing pods).

Attachment

YAML Example

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web  # Targets Deployment "web"
  updatePolicy:
    updateMode: "Auto"  # "Off" for recommendations only
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: 100m
        memory: 50Mi
      maxAllowed:
        cpu: 1
        memory: 500Mi

Commands

When to Use

Important Details

Key Differences & Best Practices

Aspect HPA VPA
Scaling Type Horizontal (# pods) Vertical (CPU/memory)
Target Deployment/StatefulSet Deployment/Pod
Downtime Minimal (rolling) Potential (eviction/recreate)
Metrics CPU/memory/custom Historical usage

Kubernetes Pod Scheduling

Pod scheduling in Kubernetes involves the Scheduler deciding which node runs a Pod based on resource availability, constraints, and preferences. Key mechanisms ensure Pods land on suitable nodes while avoiding unsuitable ones. Below is a concise explanation of the core concepts.

1. Node Taints

2. Tolerations

3. Pod Affinity

4. Pod Anti-Affinity

5. Pod Disruption Budget (PDB)

6. Node Selectors

7. Topology Spread Constraints

8. Priority and Preemption

9. Scheduler Plugins

10. Node Affinity

Node Affinity is a scheduling constraint that allows Pods to prefer or require specific nodes based on node labels (key-value pairs). It's part of the broader Affinity mechanism (.spec.affinity.nodeAffinity in Pod spec) and extends simple Node Selectors with more flexible expressions (e.g., OR logic, operators).

Definition & Purpose

Types

Type Description Enforcement
RequiredDuringSchedulingIgnoredDuringExecution Hard rule: Fail if no match. Must satisfy for scheduling.
PreferredDuringSchedulingIgnoredDuringExecution Soft rule: Score nodes (0-100); schedule anywhere if no match. Weighted preference.

Key Fields

YAML Example

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:  # Hard: Must have GPU
        nodeSelectorTerms:
        - matchExpressions:
          - key: gpu-type
            operator: In
            values: ["nvidia-a100"]
      preferredDuringSchedulingIgnoredDuringExecution:  # Soft: Prefer zone
      - weight: 80  # Higher = stronger preference
        preference:
          matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values: ["us-west-2a"]
  containers:
  - name: ml-app
    image: ml-app:1.0

Nuances

When to Use

Summary: Node Affinity refines node selection with flexible matching—hard for requirements, soft for preferences. Tune with labels for targeted scheduling.

Overall Scheduling Flow

  1. Filtering: Apply selectors, taints/tolerations, resources, affinity (hard rules).
  2. Scoring: Rank survivors (affinity weights, spread, plugins).
  3. Binding: Assign Pod to best node.
  4. Preemption: If no fit, evict lower-priority Pods (respects PDBs).

Summary: Taints repel, tolerations allow, affinity attracts/repels, PDB protects availability. Tune for HA, performance, and cost. Selectors filter basically, topology spreads evenly, priority preempts, plugins customize. Use for balanced, resilient clusters.

Kubernetes Storage

Kubernetes storage enables Pods to access persistent data across restarts, nodes, and clusters. Unlike ephemeral container storage, it uses ephemeral volumes (temporary) and persistent storage (durable).

1. Volumes

2. VolumeMounts

3. PersistentVolume (PV)

4. PersistentVolumeClaim (PVC)

5. StorageClasses

Other Fundamental Storage Concepts

Flow

  1. Create StorageClass (template).
  2. Create PVC (request) → binds to PV (storage).
  3. Pod/Deployment references PVC via volumes/volumeMounts.
  4. Data persists across Pod restarts/nodes (if RWX).

Summary: PV = storage supply, PVC = demand, StorageClass = provisioning rules. Use for stateful apps; ephemeral volumes for temp data.

Custom Resources (CRs) in Kubernetes

Custom Resources (CRs) are user-defined extensions to the Kubernetes API that allow you to create your own objects (like Pod, Deployment) with custom behavior. They are the foundation of Kubernetes Operators and extensibility. Without CRs, you'd be limited to generic resources — forcing complex logic into ConfigMaps, annotations, or external systems.

What is a Custom Resource?

A Custom Resource is a user-defined object stored in Kubernetes etcd that extends the Kubernetes API.

apiVersion: mycompany.com/v1
kind: Database
metadata:
  name: prod-db
spec:
  size: 100Gi
  engine: postgres

Core Concepts of Custom Resources

Concept Explanation
1. CRD (Custom Resource Definition) The schema that defines your new object type (like a database table schema).
2. Custom Resource (CR) An instance of the CRD (like a row in the table).
3. API Group & Version CRs live in custom API groups (e.g., stable.example.com/v1, databases.mycompany.com/v1alpha1).
4. Controller A reconciler (usually in an Operator) that watches CRs and makes the world match the desired state.
5. Validation OpenAPI v3 schema in CRD to enforce structure (e.g., size > 0).
6. Storage CRs are stored in etcd just like built-in objects.
7. Namespacing Can be namespaced or cluster-scoped.

1. CRD – The Blueprint

CRD YAML Structure

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.mycompany.com  # <plural>.<group>
spec:
  group: mycompany.com
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                size:
                  type: integer
                  minimum: 1
                engine:
                  type: string
                  enum: [postgres, mysql]
  scope: Namespaced  # or Cluster
  names:
    plural: databases
    singular: database
    kind: Database
    shortNames: [db]

Key Fields

Field Purpose
group Your domain (reverse DNS)
versions Supports multiple (like v1, v1beta1)
storage: true Only one version stores data
scope Namespaced or Cluster
names.kind The object type in YAML (kind: Database)
shortNames CLI shortcuts (kubectl get db)

2. Custom Resource (CR) – The Instance

apiVersion: mycompany.com/v1
kind: Database
metadata:
  name: prod-postgres
  namespace: production
spec:
  size: 100
  engine: postgres
  backupPolicy: daily

Apply:

kubectl apply -f database.yaml

View:

kubectl get databases
kubectl get db prod-postgres -o yaml

3. Controller – The Brain (Reconciliation Loop)

A controller watches CRs and makes the actual state match the desired state.

Reconciliation Loop

1. Watch CR events (create/update/delete)
2. Read current state (from cluster)
3. Read desired state (from CR spec)
4. Compare
5. Take action (create PVC, deploy StatefulSet, etc.)
6. Update status
7. Repeat

4. Status Subresource

CRs have two parts: - .specdesired state (input) - .statusobserved state (output)

status:
  phase: Running
  replicas: 3
  conditions:
  - type: Ready
    status: "True"
    lastUpdate: "2025-04-05T10:00:00Z"

Controller owns .status, user owns .spec.

5. Validation & Defaulting

OpenAPI v3 Schema in CRD

schema:
  openAPIV3Schema:
    type: object
    required: [spec]
    properties:
      spec:
        type: object
        required: [size, engine]
        properties:
          size:
            type: integer
            minimum: 1
            maximum: 1000
          engine:
            type: string
            enum: [postgres, mysql]

Default Values (via Webhook) Use mutating webhook to set defaults:

spec:
  size: 10  → webhook sets to 50 if omitted

Real-World Examples

Project CR Purpose
Cert-Manager Certificate Auto TLS
ArgoCD Application GitOps sync
Prometheus Operator ServiceMonitor Auto scraping
Istio VirtualService Traffic routing
Crossplane PostgreSQLInstance Cloud DB provisioning

Kubernetes Ingress & Ingress Controllers

Ingress is not a built-in resource — it's an API object that defines HTTP(S) routing rules.
An Ingress Controller is the actual software (NGINX, Traefik, HAProxy, etc.) that reads Ingress objects and configures a reverse proxy.

Ingress = "Take HTTP/S traffic from outside the cluster and route it to the correct Service (and Pod) based on URL, host, path, and other rules — all declaratively."

# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$1
spec:
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-service
            port:
              number: 80
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls-secret

2. What is an Ingress Controller?

Component Role
Ingress Resource Declarative rules (YAML)
Ingress Controller Reconciles rules → configures reverse proxy

Without a controller, Ingress does nothing.

3. Popular Ingress Controllers (2025)

Controller Type Key Features
NGINX Ingress L7 High perf, rewrite, auth
Traefik L7 Auto service discovery, middleware
HAProxy L7/L4 TCP/UDP, enterprise
Istio Gateway L7 mTLS, traffic splitting
Contour (Envoy) L7 gRPC, observability
Gloo L7 Function-level routing

4. How It Works – Step by Step

graph TD
    A[User: app.example.com/api] --> B[Load Balancer]
    B --> C[Ingress Controller Pod]
    C --> D[Reads Ingress YAML]
    D --> E[Configures NGINX/Traefik]
    E --> F[Routes to Service]
    F --> G[Pod]
  1. Userapp.example.com
  2. Cloud LB → forwards to Ingress Controller
  3. Controller watches Ingress objects
  4. Generates config → reloads proxy
  5. Routes to correct ServicePod

5. Key Ingress Fields

Field Purpose
spec.rules[].host Virtual host (e.g., api.example.com)
spec.rules[].http.paths[].path URL path (/api)
pathType Prefix, Exact, ImplementationSpecific
backend.service Target Service + port
spec.tls[] TLS termination (secret with cert/key)
metadata.annotations Controller-specific config

6. Path Types (Critical!)

Type Behavior
Prefix /api/api, /api/users
Exact /api only
ImplementationSpecific Controller decides (NGINX: regex, Traefik: regex)

7. Real-World Example (NGINX)

# 1. Services
apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  selector:
    app: web
  ports:
  - port: 80

--- 

# 2. Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: main-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    cert-manager.io/cluster-issuer: "letsencrypt"
spec:
  ingressClassName: nginx  # Points to controller
  tls:
  - hosts: [app.example.com]
    secretName: app-tls
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web
            port:
              number: 80

8. IngressClass – Avoid Conflicts

  1. An Ingress Class in Kubernetes is a resource that defines a specific Ingress controller to handle Ingress resources, allowing administrators to route traffic based on different controller capabilities and configurations.

  2. It enables the use of multiple Ingress controllers—such as NGINX, Traefik, or HAProxy—within the same cluster by associating specific Ingress resources with a designated controller through the ingressClassName field in the Ingress manifest

apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: nginx
spec:
  controller: k8s.io/ingress-nginx# In Ingress
spec:
  ingressClassName: nginx

Multiple controllers? Use ingressClassName to route.

9. TLS Termination

  1. Create Secret: yaml apiVersion: v1 kind: Secret metadata: name: app-tls type: kubernetes.io/tls data: tls.crt: base64(cert) tls.key: base64(key)

  2. Auto-TLS with cert-manager: yaml annotations: cert-manager.io/cluster-issuer: "letsencrypt"

10. Advanced Features (Controller-Specific)

Feature Annotation Controller
Rate limiting nginx.ingress.kubernetes.io/limit-rps: "10" NGINX
Auth nginx.ingress.kubernetes.io/auth-url: ... NGINX
Canary nginx.ingress.kubernetes.io/canary-weight: "20" NGINX
Middleware traefik.ingress.kubernetes.io/router.middlewares: ... Traefik

11. Architecture Diagram

architecture

Summary Table

Component Role
Ingress YAML rules
Ingress Controller Proxy (NGINX/Traefik)
IngressClass Route to correct controller
Service Backend target
Secret TLS certs

Golden Rule:

Ingress = Rules
Ingress Controller = Engine
No controller = No routing

Kubernetes RBAC

Role-Based Access Control (RBAC) is Kubernetes’ default authorization system that controls who (user/service) can do what (verbs) on which resources in which namespace.

RBAC = "Who → Can do → What → Where"

2. Core RBAC Resources

Resource Purpose
Role / ClusterRole Define permissions (verbs on resources)
RoleBinding / ClusterRoleBinding Bind permissions to users/groups/service accounts
Subject Who gets access: User, Group, ServiceAccount

3. Role vs ClusterRole

Role ClusterRole
Scope Namespaced Cluster-wide
Use default namespace only All namespaces + cluster resources
Example Edit Pods in dev View Nodes cluster-wide

4. RoleBinding vs ClusterRoleBinding

RoleBinding ClusterRoleBinding
Binds Role → subject in one namespace ClusterRole → subject cluster-wide
Can bind Only Role ClusterRole or Role (namespaced)

5. Verbs (Actions)

Verb Meaning
get Read one resource
list Read many
watch Stream changes
create Make new
update / patch Modify
delete Remove
deletecollection Bulk delete

6. Resources & API Groups

Resource API Group
pods, services "" (core)
deployments, ingresses apps, networking.k8s.io
nodes, persistentvolumes cluster-level
* All resources

7. Full Example: Dev Can Edit Pods in dev Namespace

# 1. Role: What can be done
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: dev
  name: pod-editor
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list"]
# 2. RoleBinding: Who gets the role
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: dev-pod-access
  namespace: dev
subjects:
- kind: User
  name: alice
  apiGroup: rbac.authorization.k8s.io
- kind: ServiceAccount
  name: deployer-sa
  namespace: dev
roleRef:
  kind: Role
  name: pod-editor
  apiGroup: rbac.authorization.k8s.io

8. Cluster-Wide: View All Nodes

# ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-viewer
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "watch"]
# ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: node-viewer-global
subjects:
- kind: User
  name: monitoring-bot
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: node-viewer
  apiGroup: rbac.authorization.k8s.io

9. Built-in ClusterRoles (Use These!)

ClusterRole Permissions
cluster-admin Everything
admin Most in a namespace
edit Create/update most resources
view Read-only
# Give admin in namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: namespace-admin
  namespace: staging
subjects:
- kind: User
  name: bob
roleRef:
  kind: ClusterRole
  name: admin
  apiGroup: rbac.authorization.k8s.io

10. Service Accounts & RBAC

# SA
apiVersion: v1
kind: ServiceAccount
metadata:
  name: backup-sa
  namespace: tools
# Bind to ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: backup-access
subjects:
- kind: ServiceAccount
  name: backup-sa
  namespace: tools
roleRef:
  kind: ClusterRole
  name: view
  apiGroup: rbac.authorization.k8s.io

Use in Pod:

spec:
  serviceAccountName: backup-sa

11. Testing RBAC

# Impersonate user
kubectl auth can-i create pods --as=alice -n dev
# → yes

kubectl auth can-i delete nodes --as=alice
# → no

12. Common Patterns

Goal Use
Dev team edits in namespace Role + RoleBinding
CI/CD deploys ServiceAccount + RoleBinding
Monitoring reads all ClusterRole(view) + ClusterRoleBinding
Admin per namespace ClusterRole(admin) + RoleBinding

13. Best Practices

Practice Why
Least privilege Only needed verbs/resources
Use groups system:developers
Avoid cluster-admin Except for admins
Use ServiceAccounts For apps, not users
Audit regularly kubectl get rolebindings -A

Golden Rule:

Never give cluster-admin unless absolutely needed.
Always bind ClusterRole with RoleBinding for namespace isolation.

Kubernetes Monitoring

1. Why Monitor Kubernetes?

Need What You Track
Reliability Pod restarts, OOM kills
Performance CPU, memory, latency
Capacity Node saturation
Security Anomalies, failed logins
SLOs 99.9% uptime

2. Core Monitoring Stack (2025 Standard)

Kubernetes
   ↓
cAdvisor (built-in) → Metrics Server → kube-state-metrics → Prometheus
   ↓
Grafana (dashboards) + Alertmanager + Kiali (Istio)

3. In-Built Components

Component Role Built-in?
cAdvisor Collects container metrics (CPU, memory, disk, network) Yes (in kubelet)
Metrics Server Aggregates cAdvisorkubectl top Yes (installable)
kube-state-metrics Exposes cluster state (Pods, Deployments, Nodes) No (install)

4. Metrics Server – kubectl top

What It Does

Limits

5. Prometheus – The Gold Standard

Feature Details
Pull-based Scrapes /metrics endpoints from various sources
Time-series DB Stores years of data
PromQL Powerful query language
Service Discovery.

Key Targets

Target Endpoint Metrics
kubelet /metrics, /metrics/cadvisor Container CPU/memory
API server /metrics Request latency
Nodes 10250 System stats
kube-state-metrics /metrics Pod count, phase
Your app /metrics (expose via client lib) HTTP requests, errors

6. Grafana – Visualization

# Example Panel
sum(rate(container_cpu_usage_seconds_total{namespace="prod"}[5m])) by (pod)

7. Kiali – Service Mesh Observability (Istio)

Feature Use
Service Graph Visual traffic flow
Metrics Golden signals per service
Traces Distributed tracing
Config Validation Istio config errors

Only with Istio

8. Expose Application Metrics

Go Example

import "github.com/prometheus/client_golang/prometheus/promhttp"
http.Handle("/metrics", promhttp.Handler())

Python

from prometheus_client import start_http_server
start_http_server(8000)

Annotation (Auto-scrape)

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"

9. Alertmanager – Handle Alerts

# alert.rules
groups:
- name: node-alerts
  rules:
  - alert: NodeDown
    expr: up{job="node"} == 0
    for: 5m
    labels:
      severity: critical

Routes to Slack, PagerDuty, email.

10. Full Stack Overview

fullstack

11. Summary Table

Tool Type Must-Have?
cAdvisor Container metrics Yes (built-in)
Metrics Server kubectl top Yes
Prometheus Storage + query Yes
Grafana Dashboards Yes
Kiali Service mesh Yes (with Istio)
Alertmanager Alerts Yes

Golden Rule:

"If it’s not in Prometheus, it doesn’t exist."
Instrument everything. Alert on SLOs. Visualize trends.

Other Concepts

Annotations

Annotations are arbitrary key-value metadata attached to any Kubernetes object (Pod, Service, Deployment, etc.) — but they are NOT used for selecting or filtering.

metadata:
  annotations:
    app.kubernetes.io/version: "v1.2.3"
    prometheus.io/scrape: "true"
    backup.velero.io/backup-at: "2025-04-05T02:00:00Z"
Feature Labels Annotations
Purpose Identify & select objects Attach non-identifying metadata
Used by kubectl get pod -l app=web Not used in selectors
Size Small, indexed Up to 256KB
Example app: web, env: prod description, contact, backup-policy

Why Use Annotations?

Use Case Example
Tooling Integration prometheus.io/scrape: "true" → Prometheus auto-scrapes
Operators & Controllers helm.sh/hook: pre-install → Helm runs job
Backup & Restore velero.io/exclude-from-backup: "true"
Ingress Rules nginx.ingress.kubernetes.io/rewrite-target: /$1
CI/CD Metadata build-id: 12345, git-commit: abc123
Documentation owner: team-data@company.com
Custom Automation reloader.stakater.com/auto: "true" → ConfigMap reload

Real-World Examples

# 1. Prometheus
annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"

# 2. Helm
annotations:
  meta.helm.sh/release-name: my-app
  meta.helm.sh/release-namespace: prod

# 3. Cert-Manager
annotations:
  cert-manager.io/cluster-issuer: "letsencrypt"

# 4. Custom Operator
annotations:
  database.mycompany.com/backup-policy: daily

Best Practices

Do Don’t
Use structured prefixes (prometheus.io/, app.example.com/) Use random keys
Store non-identifying data Use for selectors
Keep under 256KB Store large logs
Use for automation hooks Hardcode in code

How Tools Use Annotations

Tool Reads Annotations For
Prometheus Scraping config
Helm Release tracking
ArgoCD Sync waves
Kubelet Pod behavior
Custom Controllers Triggers, policies

Summary: - Labels = Who is this?
- Annotations = Extra info about this; metadata for tools and automation. - Not for filtering - Perfect for integration, hooks, and context

Istio

Istio = Service Mesh → Adds traffic control, security, observability to apps without code changes.

Core Architecture

Your App Pods
   ↓
Envoy Sidecar (auto-injected) to every container
   ↓
Istiod (Control Plane)

1. Traffic Management

Feature How
Path-based routing GET /api → api-v1, POST /api → api-v2
Ratio-based (Canary) 90% → v1, 10% → v2
Header-based x-user-type: beta → canary
Fault Injection Delay 2s, abort 5%
Timeouts/Retries Auto retry on 5xx
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
spec:
  hosts: [api.example.com]
  http:
  - match:
    - uri: {prefix: /api}
      headers:
        x-user: {exact: beta}
    route:
    - destination: {host: api-v2, subset: v2}
      weight: 100
  - route:
    - {host: api-v1, subset: v1, weight: 90}
    - {host: api-v2, subset: v2, weight: 10}

2. mTLS Encryption (Mutual TLS)

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
spec:
  mtls:
    mode: STRICT  # Enforce mTLS

3. Access Control (Authorization)

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
spec:
  action: ALLOW
  rules:
  - from:
      - source: {principals: ["cluster.local/ns/prod/sa/api"]}
    to:
      - operation: {methods: ["GET"], paths: ["/public/*"]}

4. Observability (Golden Signals)

Tool What
Kiali Service graph, health
Prometheus Metrics (istio_requests_total)
Jaeger/Zipkin Traces
Grafana Dashboards

5. Key Resources

Resource Purpose
VirtualService Routing rules
DestinationRule Subsets, load balancing, circuit breaker
Gateway Ingress (L7 LB)
ServiceEntry External services (e.g., api.google.com)
PeerAuthentication mTLS mode
AuthorizationPolicy RBAC for traffic

6. Example: Canary + mTLS + Auth

# 1. Subsets
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
spec:
  host: reviews
  subsets:
  - name: v1
    labels: {version: v1}
  - name: v2
    labels: {version: v2}

# 2. 90/10 routing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
spec:
  host: reviews
  http:
  - route:
    - {destination: {host: reviews, subset: v1}, weight: 90}
    - {destination: {host: reviews, subset: v2}, weight: 10}

# 3. Enforce mTLS
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
spec:
  mtls: {mode: STRICT}

7. Important Concepts

Concept Meaning
Sidecar Envoy injected into every Pod
Subset Group of Pods by labels (e.g., version: v2)
Gateway Ingress controller (replaces NGINX Ingress)
mTLS End-to-end encryption
Circuit Breaker Stop cascading failures
Fault Injection Test resilience

Golden Rule:

Istio = Envoy + Istiod → Traffic, Security, Observability without app changes.

Use Istio when: - Microservices - Canary/Blue-Green - Zero-trust security - Multi-cluster

Skip if: - Simple apps - Monolith

Now route, secure, and observe your traffic like a pro!
Try: istioctl dashboard kiali