Kubernetes (K8s)

1. Why Kubernetes? The Problem Docker Solves… and Doesn’t

Problem	Docker	Kubernetes
Run app in container	Yes	Yes
Run 100 containers	Manual	Automated
Auto-restart failed container	No	Yes
Scale to 1000 containers	No	Yes
Rolling updates	No	Yes
Self-healing	No	Yes
Multi-host deployment	No	Yes

Docker = "Run one container"
Kubernetes = "Orchestrate 10,000 containers across 100 machines"

2. What Kubernetes Offers on Top of Docker

Feature	What It Does
Orchestration	Manages 1000s of containers across nodes
Self-healing	Auto-restart, reschedule failed pods
Auto-scaling	Scale up/down based on CPU/load
Rolling Updates	Zero-downtime deployments
Service Discovery	`api.service` → auto DNS
Load Balancing	Spread traffic across pods
Secret/Config Management	Inject env vars, files securely
Multi-cloud	Run same app on AWS, GCP, Azure, on-prem

3. Kubernetes Architecture – Master vs Worker Nodes

+------------------+     gRPC/HTTP     +------------------+
|   MASTER NODE    | ◄───────────────► |   WORKER NODE    |
| (Control Plane)  |                   | (Runs Pods)      |
+------------------+                   +------------------+

MASTER NODE (Control Plane) – The Brain of K8s

Runs on 1 or 3+ nodes (HA)
Never runs user workloads
All components talk via kube-apiserver

+------------------+
|   MASTER NODE    |
|                  |
|  ┌─────────────┐ |
|  │ API Server  │ ← All communication
|  └─────▲───────┘ |
|        │         |
|  ┌─────▼───────┐ |
|  │ etcd        │ ← Single source of truth
|  └─────▲───────┘ |
|        │         |
|  ┌─────▼───────┐ |
|  │ Scheduler   │ ← "Where to run?"
|  └─────▲───────┘ |
|        │         |
|  ┌─────▼───────┐ |
|  │ Controller  │ ← "Make it match desired state"
|  │ Manager     │
|  └─────────────┘ |
+------------------+

1. `kube-apiserver` – The Front Door

Role	Details
Central API	All `kubectl`, controllers, kubelet → talk to this
REST API	`GET /api/v1/pods`, `POST /api/v1/namespaces`
Authentication	JWT, certificates, OIDC, webhook
Authorization	RBAC, ABAC, Node, Webhook
Validation	Rejects invalid YAML
Scaling	Horizontal (multiple replicas behind LB)

# You talk to this
kubectl get pods --server=https://master:6443

2. `etcd` – The Database (Single Source of Truth)

Role	Details
Key-value store	Only stores cluster state (pods, services, secrets)
Consistent & HA	Uses Raft consensus
Watched by all	Controllers react to changes
Backup critical	`etcdctl snapshot save`

# See raw data
kubectl exec -n kube-system etcd-master -- etcdctl get /registry/pods/default/myapp

If etcd dies → cluster is brain-dead
Always 3-node etcd cluster in production

3. `kube-scheduler` – The Matchmaker

Role	Details
Watches	Unscheduled pods (`nodeName: null`)
Scores nodes	CPU, memory, taints, affinity, topology
Assigns	Sets `pod.spec.nodeName`

Scoring Example

# Pod wants SSD
nodeSelector:
  disktype: ssd

→ Scheduler picks node with label disktype=ssd

4. `kube-controller-manager` – The Robot Army

Runs multiple controllers in one process:

Controller	Job
ReplicaSet	Ensure 3 pods → if 2, create 1
Deployment	Manage rollouts, rollback
StatefulSet	Ordered pods (db-0, db-1)
DaemonSet	Run on every node (logging, monitoring)
Job/CronJob	Run to completion
Node	Mark node `NotReady` if kubelet stops
Endpoint	Update Service → Pod IP mapping

# See controllers in action
kubectl get rs,deployments,statefulsets -A

5. `cloud-controller-manager`

Role	Cloud Integration
Node	Sync cloud node metadata
LoadBalancer	Create AWS ELB, GCP LB
Route	Cloud network routes
Service	Manage cloud-specific services

Only runs in cloud environments

WORKER NODE – The Muscle

Runs user workloads (pods)
Multiple per cluster

+------------------+
|   WORKER NODE    |
|                  |
|  ┌─────────────┐ |
|  │ kubelet     │ ← Talks to API server
|  └─────▲───────┘ |
|        │         |
|  ┌─────▼───────┐ |
|  │ kube-proxy  │ ← Load balances
|  └─────▲───────┘ |
|        │         |
|  ┌─────▼───────┐ |
|  │ containerd  │ ← Runs containers
|  └─────▲───────┘ |
|        │         |
|  ┌─────▼───────┐ |
|  │ Pods        │ ← Your apps
|  └─────────────┘ |
+------------------+

1. `kubelet` – The Node Agent

Role	Details
Watches API server	Gets assigned pods
Talks to container runtime	Starts/stops containers
Reports status	CPU, memory, pod phase
Exec, logs, port-forward	`kubectl exec`, `logs`
cAdvisor	Built-in metrics

# See what kubelet sees
journalctl -u kubelet

2. `kube-proxy` – The Network Cop

Role	Details
Watches Services & Endpoints	When pod IP changes
Programs iptables / IPVS	Routes traffic
Load balances	Round-robin across pods

Service Types Handled

type: ClusterIP  → 10.96.0.1 → iptables DNAT
type: NodePort   → 30080 → iptables
type: LoadBalancer → cloud LB

3. Container Runtime – The Engine

Runtime	Status
containerd	Default since K8s 1.24
CRI-O	Red Hat, lightweight
Docker	Deprecated (shim removed)

Docker still works via dockershim → containerd

# Check runtime
kubectl get nodes -o wide
# → container-runtime: containerd://1.7.0

Real-World Flow

graph TD
    A[User: kubectl apply] --> B[API Server]
    B --> C[etcd: store desired state]
    C --> D[Scheduler: pick node]
    D --> E[kubelet on node]
    E --> F[containerd: pull image]
    F --> G[Start containers]
    G --> H[kube-proxy: update iptables]
    H --> I[Service ready]

High Availability (HA) Setup

Component	HA Strategy
API Server	3+ replicas → LB (keepalived, cloud LB)
etcd	3-node cluster (Raft)
Scheduler / Controller	Run on all masters (leader election)
Worker Nodes	3+ for redundancy

Summary Table

Node	Component	Job
Master	`kube-apiserver`	API gateway
	`etcd`	Cluster database
	`scheduler`	Assign pods to nodes
	`controller-manager`	Run control loops
Worker	`kubelet`	Run pods on node
	`kube-proxy`	Network proxy
	`containerd`	Run containers

Golden Rule:

Master = Think, Store, Schedule
Worker = Run, Report, Route

Now you understand how Kubernetes turns 100 machines into one logical supercomputer.
Try:

kubectl get componentstatuses
kubectl -n kube-system get pods

And see the control plane in action!

Kubernetes Resources

Pods, Deployments, Services, DaemonSets, Secrets, ConfigMaps, StatefulSets & More

Kubernetes (K8s) resources are the declarative building blocks of your cluster. You define desired state in YAML/JSON, and K8s makes it reality through controllers and reconciliation loops.

Key Principle:

Imperative (kubectl run) → temporary
Declarative (YAML) → persistent, version-controlled

1. Pod – The Atomic Unit

Definition

Smallest deployable unit in K8s.
Runs 1+ containers that share network, storage, and lifecycle.
Ephemeral – pods die; don't manage directly (use Deployment).

Key Features

Shared Resources: Containers in one pod communicate via localhost.
Lifecycle: Scheduled to nodes, runs until completion/failure.
Probes: Readiness/liveness to control traffic/health.

Kubernetes Probes

Kubernetes probes are health checks that determine pod behavior. They use HTTP, TCP, or command-based tests with configurable thresholds (e.g., initial delay, period, timeout, success/failure counts).

Liveness Probe
- Purpose: Detects if the pod is alive and healthy. If it fails, Kubernetes restarts the pod (self-healing).
- When Used: For apps that can deadlock or crash (e.g., memory leaks).
- Behavior: Failure → pod restarts; doesn't affect traffic routing.
- YAML Example:
yaml livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 # Restart after 3 failures
Readiness Probe
- Purpose: Detects if the pod is ready to serve traffic. If it fails, Kubernetes removes it from Service endpoints (no traffic sent) but doesn't restart.
- When Used: For apps that need warmup time or become temporarily unhealthy (e.g., during DB connection).
- Behavior: Failure → pod excluded from load balancing; restarts only if liveness fails.
- YAML Example: yaml readinessProbe: tcpSocket: port: 8080 initialDelaySeconds: 5 periodSeconds: 10 timeoutSeconds: 1 successThreshold: 1 failureThreshold: 3

Key Differences

Aspect	Liveness Probe	Readiness Probe
Failure Action	Restart pod	Exclude from traffic
Impact on Service	No (traffic continues to healthy pods)	Yes (removes from endpoints)
Default	None	None
Use Case	Crash recovery	Traffic routing

YAML Example

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels:
    app: web
spec:
  containers:
  - name: nginx
    image: nginx:1.25
    ports:
    - containerPort: 80
    resources:
      limits:
        cpu: "100m"
        memory: "128Mi"
  - name: sidecar-logger
    image: fluentd:v1.14
    volumeMounts:
    - name: logs
      mountPath: /var/log/nginx
  volumes:
  - name: logs
    emptyDir: {}

Use Cases

Simple apps (single container).
Sidecar pattern (app + logger/monitor).

Pros/Cons

Pros: Fine-grained control.
Cons: No auto-restart; use with Deployment.

2. Deployment – The Workhorse for Stateless Apps

Definition

Manages ReplicaSets to ensure desired pod replicas.
Handles rolling updates, rollbacks, and scaling.
Stateless – assumes pods are interchangeable.

Key Features

Strategy: RollingUpdate (default) or Recreate.
Selectors: Matches pods via labels.
Revision History: Tracks changes for rollback.

YAML Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%     # Extra pods during update
      maxUnavailable: 25%  # Allowed downtime
  template:  # Pod template
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx
        image: nginx:1.25
        ports:
        - containerPort: 80
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10

Use Cases

Web servers, APIs, microservices.
Scaling: kubectl scale deployment web --replicas=5.

Pros/Cons

Pros: Zero-downtime updates, self-healing.
Cons: Not for stateful apps (use StatefulSet).

Rolling Updates

Rolling update is a zero-downtime deployment strategy in Kubernetes that gradually replaces old pods with new ones in a Deployment or StatefulSet. It ensures service availability by maintaining the desired number of replicas during updates (e.g., image version change), avoiding full outages.

Why Used?: Minimizes disruption, supports canary/blue-green-like deployments, and auto-rollbacks on failures.
When Applied?: Triggered by changes in Deployment spec (e.g., image: v1.0 → v1.1).

Strategies

Kubernetes supports 2 strategies in Deployment/StatefulSet .spec.strategy.type:

Strategy	Description	Use Case
RollingUpdate (default)	Gradually scales down old pods while scaling up new ones, maintaining availability.	Production apps needing zero-downtime.
Recreate	Kills all old pods first, then creates new ones.	Simple apps where downtime is acceptable (e.g., batch jobs).

RollingUpdate Parameters

.maxSurge: Max extra pods allowed during update (e.g., 25% or 1 → temporary surge).
.maxUnavailable: Max pods that can be unavailable (e.g., 25% or 1 → controlled downtime).
Default: maxSurge: 25%, maxUnavailable: 25%.

Monitor: kubectl rollout status deployment/web
Rollback: kubectl rollout undo deployment/web

3. Service – Load Balancer & Service Discovery

Definition

Stable endpoint for pods (abstracts pod IPs).
Load balances traffic across matching pods.
DNS-based discovery (e.g., web.default.svc.cluster.local).
To hit the deployment, we need to create a service and attach that to the deployment using labels and selectors. Then the internal url would be {serviceName}.{namespace}.svc.cluter.local:{servicePort}.

Types

Type	Description	Use Case
`ClusterIP` (default)	Internal IP (10.96.x.x)	Internal services
`NodePort`	Exposes on node port (30000-32767)	Basic external access
`LoadBalancer`	Cloud LB (AWS ELB)	Production external
`ExternalName`	CNAME to external service	Integrate with legacy

YAML Example

apiVersion: v1
kind: Service
metadata:
  name: web-service
spec:
  selector:
    app: web  # Matches deployment labels
  ports:
    - protocol: TCP
      port: 80      # Service port
      targetPort: 80  # Pod port
  type: LoadBalancer

Use Cases

Expose Deployment: kubectl get svc → external IP.
Internal: Pods call web-service:80.

Pros/Cons

Pros: Automatic load balancing, health checks.
Cons: ClusterIP not external-facing.

How Services Identify Pods/Deployments

Services discover and route traffic to Pods using label selectors in the Service spec. They don't directly reference Deployments but match Pods created by Deployments/StatefulSets via shared labels.

Mechanism:
Pods (from Deployment) get labels (e.g., app: web).
Service .spec.selector matches these labels.
Kubernetes watches Pods and updates Service endpoints (Pod IPs) dynamically.
YAML Example: ```yaml # Deployment (creates Pods with labels) apiVersion: apps/v1 kind: Deployment metadata: name: web spec: selector: matchLabels: app: web template: metadata: labels: app: web # Pod label spec: containers: - name: nginx image: nginx

# Service (matches via selector) apiVersion: v1 kind: Service metadata: name: web-service spec: selector: app: web # Matches Pod labels ports: - port: 80 `` - **Discovery**: Pods atweb-service:80(DNS:web-service.default.svc.cluster.local`).

containerPort vs targetPort

containerPort (Pod spec): The port the container listens on (documentation only; doesn't publish traffic).
targetPort (Service spec): The port on the Pod that receives Service traffic (maps to containerPort; defaults to Service port if omitted).

Field	Location	Purpose	Example
containerPort	Pod template (Deployment)	Container's listening port (info only)	`containerPort: 8080` (app binds to 8080)
targetPort	Service spec	Pod port for incoming traffic	`targetPort: 8080` (Service sends to Pod:8080)

YAML Example: ```yaml # In Deployment Pod spec containers:
name: app ports:
- containerPort: 8080 # App listens here
In Service

spec: ports: - port: 80 # Service port (e.g., DNS:80) targetPort: 8080 # Routes to Pod:8080 ```
Flow: Client → Service:80 → Pod:8080 (containerPort).

Other Critical Fields for Service Discovery & Health Checks

Ensure seamless discovery (stable endpoints) and health (traffic routing) with these fields:

Field	Location	Purpose	Best Practice
selector	Service spec	Matches Pod labels for discovery	Use unique labels (e.g., `app: web, tier: frontend`).
labels	Pod template (Deployment)	Enables selector matching	Consistent across Deployment/Service (e.g., `app: web`).
readinessProbe	Pod template	Checks if Pod is ready for traffic; failure removes from endpoints	HTTP/TCP/exec probe; e.g., `initialDelaySeconds: 30` for warmup.
livenessProbe	Pod template	Checks if Pod is alive; failure restarts Pod (affects discovery indirectly)	Less frequent than readiness; e.g., `periodSeconds: 60`.
port	Service spec	Service's listening port (e.g., for DNS)	Match app needs; use `name: http` for multiple ports.
protocol	Service/Port spec	Traffic protocol (TCP/UDP/SCTP)	TCP default; UDP for streaming.
sessionAffinity	Service spec	Sticky sessions (client IP-based)	`ClientIP` for stateful apps; timeout configurable.
publishNotReadyAddresses	Service spec	Include unready Pods in endpoints	`true` for pre-warmup traffic (rare).
annotations	Service metadata	Metadata (e.g., for Ingress controllers)	e.g., `nginx.ingress.kubernetes.io/rewrite-target: /`.

YAML Snippet (Full Example)

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      - name: app
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          periodSeconds: 30

apiVersion: v1
kind: Service
spec:
  selector:
    app: myapp
  ports:
  - name: http
    protocol: TCP
    port: 80
    targetPort: 8080
  sessionAffinity: ClientIP
  publishNotReadyAddresses: false

Summary: Selectors enable discovery; probes ensure health; tune ports/probes for reliability. Misconfigured selectors cause "no endpoints" errors.

4. DaemonSet – Run on Every Node

Definition

Ensures one pod per node (or selected nodes).
Node-specific – ideal for agents.

Key Features

Scheduling: Runs on all (or tainted) nodes.
Rolling Updates: Similar to Deployment.

YAML Example

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-logging
spec:
  selector:
    matchLabels:
      name: fluentd
  template:
    metadata:
      labels:
        name: fluentd
    spec:
      containers:
      - name: fluentd
        image: fluent/fluentd:v1.14
        volumeMounts:
        - name: varlog
          mountPath: /var/log
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      tolerations:  # Run on tainted nodes
      - operator: Exists

Use Cases

Logging (Fluentd), monitoring (Prometheus Node Exporter), storage (CSI drivers).

Pros/Cons

Pros: Automatic per-node deployment.
Cons: Scales with nodes; resource-heavy.

5. Secrets – Secure Data Management

Definition

Stores sensitive data (passwords, tokens, keys) as base64-encoded strings.
Mounts as volumes or env vars (encrypted at rest in etcd).

Key Features

Base64 Encoding: Not encryption (use external tools for strong secrets).
Access Control: RBAC for reading.

Security Aspects of Secrets

Kubernetes Secrets store sensitive data (e.g., passwords, API keys, tokens) as base64-encoded strings in etcd (the cluster's key-value store). Key security features:

Access Control: Protected by RBAC (Role-Based Access Control) policies; only authorized pods/services can read them.
Encryption at Rest: etcd can be configured for encryption (e.g., via EncryptionProvider). Secrets are not encrypted by default but can be with external tools like Vault.
Transmission: Data is transmitted over TLS (via API server); not logged in plaintext

Mounts

Mounting injects data into Pods as environment variables (env) or volumes (volumeMounts). Volumes are preferred for files; env vars for simple values. Defined in Deployment's Pod template (.spec.template.spec).

As Environment Variables: Injects keys as vars (e.g., DB_PASSWORD).
As Volumes: Mounts as files in a directory (e.g., /etc/secrets/); updates require Pod restart.

NOTE: When mounting Secrets and ConfigMaps as volumes (volumeMounts in Pod spec), updates propagate automatically without Pod restart. The mounted files (e.g., /etc/secrets/token) are symlinks to etcd, so changes in the Secret/ConfigMap update the files in-place.

YAML Example

apiVersion: v1
kind: Secret
metadata:
  name: db-secret
type: Opaque
data:
  username: YXBwdXNlcg==  # base64: "appuser"
  password: U3VwZXJTZWNyZXQxMjM=  # base64: "SuperSecret123"

apiVersion: v1
kind: Pod
metadata:
  name: secret-pod
spec:
  containers:
  - name: app
    image: myapp
    env:
    - name: DB_USER
      valueFrom:
        secretKeyRef:
          name: db-secret
          key: username
    volumeMounts:
    - name: secret-volume
      mountPath: /etc/secrets
  volumes:
  - name: secret-volume
    secret:
      secretName: db-secret

Use Cases

DB credentials, API keys, TLS certs.

Pros/Cons

Pros: Avoids hardcoding secrets.
Cons: Base64 is reversible; use Vault for advanced.

6. ConfigMap – Non-Sensitive Configuration

Definition

Stores config data (env vars, files) as key-value pairs.
Mounts dynamically without rebuilding images.

Key Features

Data Sources: Key-value, files, or literals.
Updates: Reload pods without restart (for some apps).

Mounts

Same as secrets

YAML Example

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  database_url: "postgres://localhost:5432/myapp"
  log_level: "INFO"
  app_name: "MyApp v1.0"

apiVersion: v1
kind: Pod
metadata:
  name: config-pod
spec:
  containers:
  - name: app
    image: myapp
    env:
    - name: DB_URL
      valueFrom:
        configMapKeyRef:
          name: app-config
          key: database_url
    volumeMounts:
    - name: config-volume
      mountPath: /etc/config
  volumes:
  - name: config-volume
    configMap:
      name: app-config

Use Cases

App configs, feature flags, env-specific settings.

Pros/Cons

Pros: Decouples config from code.
Cons: Not encrypted (use Secrets for sensitive).

7. StatefulSet – For Stateful Apps

Definition

Manages stateful workloads (e.g., databases) with stable identities.
Ordered deployment/scaling, persistent storage.

Stateful Workload

A stateful workload is an application or service that maintains persistent state (data, configuration, or identity) across restarts, updates, or failures. It requires stable, ordered, and persistent storage to function correctly, unlike stateless workloads where instances are interchangeable.

Key Characteristics - Persistent Data: Relies on durable storage (e.g., databases with user records). - Stable Identity: Needs consistent naming/ordering (e.g., db-0, db-1). - Ordered Operations: Scaling/updates must follow sequence (e.g., primary replica before secondary).

Examples - Stateful: Databases (MySQL, MongoDB), message queues (Kafka), clustered apps (ZooKeeper). - Stateless: Web servers (Nginx), APIs (FastAPI), simple microservices (no local data).

Why It Matters in Kubernetes

Deployment: Use StatefulSet for stable Pods, headless Services, and PersistentVolumes (PVs).
Challenges: Scaling requires coordination; failures need data migration.
vs. Stateless: Deployments handle stateless apps easily (interchangeable replicas).

Summary: Stateful = "remembers who it is and what it knows" (e.g., your bank account balance). Use for data-heavy apps; stateless = "doesn't care" (e.g., a calculator).

Key Features

Stable Names: Pods named db-0, db-1 (not random).
Headless Service: Direct pod access via DNS.
- A Headless Service is a Kubernetes Service with clusterIP: None.
- It does NOT get a single virtual IP — instead, it returns direct DNS A records for each Pod.
Persistent Volumes: Binds storage to pod identity.

YAML Example

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: "mysql-headless"
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: "password"
        volumeMounts:
        - name: mysql-storage
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: mysql-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
spec:
  clusterIP: None  # Headless
  selector:
    app: mysql
  ports:
  - port: 3306

Use Cases

Databases (MySQL, MongoDB), message queues (Kafka), clustered apps.

Pros/Cons

Pros: Ordered scaling, stable storage.
Cons: Slower scaling than Deployment.

8. Other Key Resources

ReplicaSet

Ensures exact replica count (used by Deployment).
YAML: Similar to Deployment but no strategy.

Job & CronJob

Job: Run to completion (e.g., batch processing).
CronJob: Scheduled jobs (e.g., daily backups).
Example: yaml apiVersion: batch/v1 kind: Job metadata: name: backup-job spec: template: spec: containers: - name: backup image: backup-tool restartPolicy: Never

Resource Relationships

User (YAML) → API Server → etcd
                    ↓
            Controller Loop
                    ↓
Deployment → ReplicaSet → Pod → Container
                    ↓
                  Service → Load Balance

Summary Table

Resource	Use Case	Key Feature
Pod	Basic unit	1+ containers
Deployment	Stateless apps	Rolling updates
Service	Exposure	Load balancing
DaemonSet	Node agents	Per-node pods
Secrets	Sensitive data	Encrypted env/files
ConfigMap	Config	Dynamic injection
StatefulSet	Databases	Ordered, stable

Golden Rule:

Declarative YAML + Controllers = Self-healing cluster
Define desired state → K8s makes it real.

Now deploy a Deployment + Service and watch K8s orchestrate!

Kubernetes Autoscalers: HPA vs VPA

Kubernetes autoscalers dynamically adjust resources based on workload demands. HPA scales horizontally (more/fewer pods), while VPA scales vertically (CPU/memory allocation). Neither attaches to Services (Services route traffic to existing pods); they target Deployments, StatefulSets, or ReplicaSets (for HPA) or Pods (for VPA).

Horizontal Pod Autoscaler (HPA)

HPA automatically scales the number of pods in a target resource (e.g., Deployment) based on observed metrics like CPU utilization, memory, or custom metrics (via Metrics Server or Prometheus Adapter).

Monitors metrics (default: 80% CPU threshold).
Scales up/down to maintain target (e.g., replicas = current load / target utilization).
Min/max replicas configurable.

Attachment

Targets: Deployment, StatefulSet, ReplicaSet.
YAML: Reference via .spec.scaleTargetRef (e.g., kind: Deployment, name: web).

YAML Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web  # Attaches to Deployment "web"
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50  # Scale at 50% CPU
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 500Mi  # Scale at 500Mi memory

Commands

Apply: kubectl apply -f hpa.yaml.
Monitor: kubectl get hpa, kubectl describe hpa web-hpa.
Delete: kubectl delete hpa web-hpa.

When to Use

High-traffic apps (e.g., web servers) with variable load.
Cost optimization: Scale down during low traffic.
Not for: Stateful apps (use StatefulSet HPA cautiously) or fixed-size workloads.

Important Details

Requires Metrics Server (kubectl top pods works).
Cooldown: 5min default between scales.
Pros: Simple, reactive scaling.
Cons: Doesn't predict spikes; may overprovision.

How HPA Works with StatefulSets

Scaling Up:
HPA increases replicas (e.g., from 3 to 5).
StatefulSet controller creates new Pods in order (e.g., web-3, then web-4).
Each new Pod gets a stable hostname (web-3.<statefulset-name>.<namespace>.svc.cluster.local) and attaches to its corresponding PersistentVolumeClaim (PVC) (e.g., data-web-3).
Pods join the cluster (e.g., as replicas in a database like etcd).
Scaling Down:
- HPA decreases replicas (e.g., from 5 to 3).
- StatefulSet controller deletes highest-indexed Pods first (e.g., web-4, then web-3).
- Deleted Pods are terminated gracefully (with termination grace period, default 30s).
- Data Persistence on Downscale
  - Yes, data persists: StatefulSets bind PVCs to Pod identities (ordinal index).
  - When scaling down, only the Pod is deleted; the PVC (and its bound PersistentVolume/PV) remains.
  - Example: Scaling from 3 to 2 deletes web-2; data-web-2 PVC persists.
  - Re-attach on Scale-Up: If scaled back to 3, web-2 re-creates and re-mounts data-web-2 PVC, preserving data.
  - No Data Loss: Unlike Deployments (ephemeral storage), StatefulSets ensure ordered persistence.
Metrics & Triggers:
Same as Deployments: Monitors CPU/memory/custom metrics.
HPA calculates: desiredReplicas = ceil[currentReplicas × (currentMetricValue / desiredMetricValue)].
Cooldown: 5min default between scales.

Nuances & Considerations for StatefulSets

Ordered Scaling: Unlike Deployments (random Pod deletion), StatefulSets scales down from the end (-n first). Use .spec.updateStrategy.rollingUpdate.partition for canary-like control.
Headless Service: Required for StatefulSet discovery (DNS: web-2.web-headless.default.svc.cluster.local); HPA doesn't affect it.
Storage Coordination: Ensure PVs are zone-aware (topology keys) for multi-zone clusters to avoid data locality issues.
Metrics Challenges: Stateful Pods may have uneven load (e.g., primary replica); use custom metrics (e.g., via Prometheus Adapter) for accurate scaling.
Downtime Risk: Downscaling may disrupt state (e.g., lose quorum in 3-node etcd); set minReplicas high and use PodDisruptionBudgets (PDBs) to limit evictions.
Not for All: HPA works but test thoroughly; for databases, prefer Vertical Scaling (VPA) or manual control.
Limits: Max replicas capped by cluster capacity; HPA ignores PVC provisioning.

Best Practice: Combine with PDBs (kubectl create pdb web-pdb --min-available=2) to prevent too many simultaneous downscales.

Summary: HPA scales StatefulSets like Deployments but preserves data via PVCs and ordered identities. Use for elastic stateful apps (e.g., Kafka replicas); monitor for state consistency.

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts Pod resource requests/limits (CPU/memory) based on historical usage, recommending or enforcing changes. It performs vertical scaling (resizing existing pods).

Analyzes metrics (via Metrics Server/Prometheus).
Recommends (mode: Off) or updates (mode: Auto) resources.
Evicts/recreates pods for changes (downtime risk).

Attachment

Targets: Pods, Deployments, StatefulSets (via Pod template).
YAML: No direct "attachment"; VPA watches via .spec.targetRef (e.g., Deployment name).

YAML Example

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web  # Targets Deployment "web"
  updatePolicy:
    updateMode: "Auto"  # "Off" for recommendations only
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: 100m
        memory: 50Mi
      maxAllowed:
        cpu: 1
        memory: 500Mi

Commands

Apply: kubectl apply -f vpa.yaml.
View Recommendations: kubectl get vpa web-vpa -o yaml (under .status.recommendation).
Evict for Update: kubectl evict pod <pod-name> (if Auto mode).

When to Use

Resource-inefficient apps (e.g., over/under-provisioned pods).
Cost savings: Right-size based on actual usage.
Not for: Apps with bursty loads (use HPA) or strict limits (manual tuning better).

Important Details

Requires VPA admission controller.
Modes: Off (recommendations), Initial (set on create), Auto (enforce, with eviction).
Pros: Optimizes resources; learns from usage.
Cons: Causes restarts; not for all apps (e.g., databases).

Key Differences & Best Practices

Aspect	HPA	VPA
Scaling Type	Horizontal (# pods)	Vertical (CPU/memory)
Target	Deployment/StatefulSet	Deployment/Pod
Downtime	Minimal (rolling)	Potential (eviction/recreate)
Metrics	CPU/memory/custom	Historical usage

Combine: Use HPA for traffic spikes, VPA for baseline optimization.
Monitor: kubectl top nodes/pods for metrics.
When: HPA for dynamic load; VPA for static apps.
Caution: VPA in Auto mode can disrupt; start with Off.

Kubernetes Pod Scheduling

Pod scheduling in Kubernetes involves the Scheduler deciding which node runs a Pod based on resource availability, constraints, and preferences. Key mechanisms ensure Pods land on suitable nodes while avoiding unsuitable ones. Below is a concise explanation of the core concepts.

1. Node Taints

Definition: Taints are repellent labels applied to nodes (via kubectl taint nodes) that prevent Pods from scheduling unless they tolerate the taint. They act as "do not disturb" signals.
Purpose: Reserve nodes for specific workloads (e.g., dedicated DB nodes) or mark unhealthy nodes.
Types:
NoSchedule: Prevents new Pods from scheduling.
PreferNoSchedule: Soft repellent (scheduler prefers avoidance).
NoExecute: Evicts existing Pods + prevents new ones.
YAML Example (Apply Taint): bash kubectl taint nodes worker-1 key=value:NoSchedule
Effect: Untolerated Pods are rejected; e.g., taint dedicated=db:NoSchedule reserves for DB Pods only.

2. Tolerations

Definition: Tolerations are Pod-level settings (in .spec.tolerations) that allow Pods to ignore specific taints and schedule on tainted nodes.
Purpose: Enables Pods to run on reserved/tainted nodes (e.g., high-CPU nodes).
Matching: Toleration must match taint's key, value, and effect (operator: Exists for any value, Equal for exact).
YAML Example (in Pod/Deployment spec): ```yaml spec: tolerations:
- key: "key" operator: "Equal" value: "value" effect: "NoSchedule"
- key: "dedicated" operator: "Exists" effect: "NoExecute" # Tolerates any value ```
Nuance: Tolerations don't prefer tainted nodes; they just allow scheduling.

3. Pod Affinity

Definition: Affinity rules in Pod spec (.spec.affinity) prefer or require Pods to schedule on nodes matching certain conditions (e.g., labels).
Purpose: Co-locate Pods for performance (e.g., app near its DB).
Types:
RequiredDuringSchedulingIgnoredDuringExecution: Hard requirement (fail if no match).
PreferredDuringSchedulingIgnoredDuringExecution: Soft preference (score-based).
Pod Affinity: Co-locate with other Pods (e.g., topologyKey: kubernetes.io/hostname for same node). ```yaml podAffinity: requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector: matchExpressions:
  - key: app operator: In values: ["cache"] topologyKey: kubernetes.io/hostname ```

4. Pod Anti-Affinity

Definition: Opposite of affinity; avoids scheduling Pods on nodes with matching conditions.
Purpose: Spread Pods for high availability (e.g., replicas on different nodes/zones).
Types: Required (hard) or Preferred (soft).
YAML Example: yaml spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 # Higher = stronger preference podAffinityTerm: labelSelector: matchLabels: app: web topologyKey: kubernetes.io/hostname # Avoid same node
Nuance: Use topologyKey: failure-domain.beta.kubernetes.io/zone for zone spreading.

5. Pod Disruption Budget (PDB)

Definition: PDBs (via kubectl create pdb) limit voluntary disruptions (e.g., node drains, scaling) to ensure minimum available Pods.
Purpose: Prevents too many Pods from going down simultaneously (e.g., during upgrades).
Fields:
minAvailable: Min Pods that must be available (e.g., 2 or 50%).
maxUnavailable: Max Pods that can be unavailable (e.g., 1 or 25%).
YAML Example: yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: web-pdb spec: minAvailable: 2 # At least 2 Pods up selector: matchLabels: app: web
Nuance: Applies to Deployments/StatefulSets; ignored during involuntary disruptions (e.g., node failure).

6. Node Selectors

Definition: Simple, declarative way to constrain Pod scheduling to nodes matching specific labels (key-value pairs on nodes). It's a hard filter—Pods only schedule on matching nodes.
Purpose: Basic node affinity without complex expressions (e.g., target high-CPU nodes).
How It Works: Defined in Pod spec (.spec.nodeSelector); Scheduler filters nodes where all key-value pairs match.
YAML Example: ```yaml apiVersion: v1 kind: Pod metadata: name: high-cpu-pod spec: nodeSelector: cpu-type: high-performance # Matches nodes labeled 'cpu-type=high-performance' containers:
- name: app image: myapp:1.0 ```
Apply Label to Node: kubectl label nodes worker-1 cpu-type=high-performance.
Nuances:
Ignores taints (combine with tolerations).
Simple but limited (no OR logic; use nodeAffinity for advanced).
When to Use: Simple zoning (e.g., dev/prod nodes); not for dynamic rules.

7. Topology Spread Constraints

Definition: Ensures Pods are evenly distributed across topology domains (e.g., zones, nodes, regions) to improve availability and resource utilization.
Purpose: Prevents all Pods from landing on one node/zone (e.g., for HA).
How It Works: Scheduler scores based on whenUnsatisfiable (ScheduleAnyway/DoNotSchedule) and maxSkew (max imbalance). Uses topologyKey (e.g., topology.kubernetes.io/zone).
YAML Example: yaml apiVersion: apps/v1 kind: Deployment spec: template: spec: topologySpreadConstraints: - maxSkew: 1 # Max 1 Pod difference per zone topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule # Hard constraint labelSelector: matchLabels: app: web
Nuances:
Applies to Pods matching the selector.
Combines with affinity (e.g., spread replicas across AZs).
When to Use: Multi-zone clusters for fault tolerance; avoids single points of failure.

8. Priority and Preemption

Definition: Assigns priority levels to Pods via PriorityClasses, enabling preemption (eviction of lower-priority Pods when resources are scarce).
Purpose: Ensures critical workloads (e.g., system Pods) run first by evicting non-critical ones.
How It Works:
PriorityClass: Global resource defining priority (e.g., 1000 for high, -1 for low).
Preemption: Scheduler evicts lower-priority Pods if a higher one can't schedule.
YAML: Reference in Pod spec (.spec.priorityClassName).
YAML Example: ```yaml # PriorityClass (cluster-wide) apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority value: 1000 # Higher = more important globalDefault: false description: "Critical workloads" # Pod using it apiVersion: v1 kind: Pod spec: priorityClassName: high-priority containers:
- name: critical-app image: critical:1.0 ```
Nuances:
Eviction uses PDBs to limit impact.
System Pods (e.g., kube-system) have high defaults.
When to Use: Resource-constrained clusters; prioritize monitoring over dev workloads.

9. Scheduler Plugins

Definition: Extensible components in the kube-scheduler that perform filtering (eliminate unfit nodes) and scoring (rank remaining nodes).
Purpose: Customizes scheduling logic (e.g., for GPU affinity or cost optimization).
How It Works:
Filter Plugins: Hard checks (e.g., NodeAffinity, TaintToleration).
Score Plugins: Weighted scoring (e.g., ImageLocality for faster pulls).
Configured via SchedulerConfig (e.g., kube-scheduler.yaml).
YAML Example (Custom Config Snippet): ```yaml apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration profiles:
schedulerName: my-scheduler plugins: filter: enabled: - name: NodeAffinity - name: TaintToleration score: enabled: - name: ImageLocality # Prefer nodes with cached images weight: 10 ```
Nuances:
Default scheduler has ~20 plugins; extend via custom scheduler (e.g., Volcano for batch).
Order matters (early filters prune faster).
When to Use: Advanced needs (e.g., gang scheduling for ML jobs); default suffices for most.

10. Node Affinity

Node Affinity is a scheduling constraint that allows Pods to prefer or require specific nodes based on node labels (key-value pairs). It's part of the broader Affinity mechanism (.spec.affinity.nodeAffinity in Pod spec) and extends simple Node Selectors with more flexible expressions (e.g., OR logic, operators).

Definition & Purpose

Hard Requirement: Ensures Pods only schedule on matching nodes (e.g., nodes with SSDs).
Soft Preference: Scores nodes for better placement (e.g., prefer low-latency zones).
Use Case: Resource optimization (e.g., GPU nodes for ML), zoning (dev/prod separation), or performance (local storage nodes).

Types

Type	Description	Enforcement
RequiredDuringSchedulingIgnoredDuringExecution	Hard rule: Fail if no match.	Must satisfy for scheduling.
PreferredDuringSchedulingIgnoredDuringExecution	Soft rule: Score nodes (0-100); schedule anywhere if no match.	Weighted preference.

Key Fields

nodeSelectorTerms: Array of terms (OR logic across terms; AND within expressions).
matchExpressions: Operators like In, NotIn, Exists, DoesNotExist, Gt, Lt.
matchFields: Matches node fields (e.g., spec.unschedulable); less common.

YAML Example

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:  # Hard: Must have GPU
        nodeSelectorTerms:
        - matchExpressions:
          - key: gpu-type
            operator: In
            values: ["nvidia-a100"]
      preferredDuringSchedulingIgnoredDuringExecution:  # Soft: Prefer zone
      - weight: 80  # Higher = stronger preference
        preference:
          matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values: ["us-west-2a"]
  containers:
  - name: ml-app
    image: ml-app:1.0

Nuances

Labels: Apply to nodes via kubectl label nodes worker-1 gpu-type=nvidia-a100.
vs Node Selectors: Affinity is more expressive (multiple terms, operators); Selectors are simple equality.
Dynamic: Labels can change post-scheduling (no re-evaluation).
Performance: Soft rules add scoring overhead; use sparingly.

When to Use

Hard: Critical hardware (e.g., GPUs).
Soft: Optimization (e.g., zone preference for latency).
Avoid: Overly restrictive rules causing scheduling failures.

Summary: Node Affinity refines node selection with flexible matching—hard for requirements, soft for preferences. Tune with labels for targeted scheduling.

Overall Scheduling Flow

Filtering: Apply selectors, taints/tolerations, resources, affinity (hard rules).
Scoring: Rank survivors (affinity weights, spread, plugins).
Binding: Assign Pod to best node.
Preemption: If no fit, evict lower-priority Pods (respects PDBs).

Summary: Taints repel, tolerations allow, affinity attracts/repels, PDB protects availability. Tune for HA, performance, and cost. Selectors filter basically, topology spreads evenly, priority preempts, plugins customize. Use for balanced, resilient clusters.

Kubernetes Storage

Kubernetes storage enables Pods to access persistent data across restarts, nodes, and clusters. Unlike ephemeral container storage, it uses ephemeral volumes (temporary) and persistent storage (durable).

1. Volumes

Definition: A directory accessible to Pods, providing storage inside containers. Volumes outlive container lifecycle but tie to Pod lifecycle (deleted when Pod dies).
Purpose: Share data between containers in a Pod or persist temporary data.
Types (Ephemeral): | Type | Description | Use Case | |------|-------------|---------| | emptyDir | Temporary, node-local (deleted on Pod eviction). | Scratch space, logs. | | hostPath | Mounts host directory (e.g., /var/log). | Access host files (insecure). | | configMap/Secret | Mounts ConfigMap/Secret as files. | Config injection. |
YAML Example (in Pod spec): ```yaml spec: volumes:
- name: temp-storage emptyDir: {}
- name: host-logs hostPath: path: /var/log type: DirectoryOrCreate ```
Nuances: Ephemeral; for persistence, use PV/PVC.

2. VolumeMounts

Definition: Specifies how a Volume is mounted into a container (path and read-only flag).
Purpose: Injects storage into specific containers within a Pod.
YAML Example: ```yaml spec: containers:
- name: app volumeMounts:
- name: temp-storage # References volume mountPath: /app/tmp # Inside container readOnly: false ```
Nuances: Multiple mounts per volume; subPath for selective files (e.g., subPath: config.yaml).

3. PersistentVolume (PV)

Definition: A cluster-wide storage resource representing physical storage (e.g., AWS EBS volume, NFS share). It's a "piece of storage in the cluster."
Purpose: Abstracts backend storage; provisioned manually or dynamically.
Key Fields:
Capacity: Size (e.g., storage: 10Gi).
AccessModes: How it's mounted (e.g., ReadWriteOnce (RWO): single node; ReadWriteMany (RWX): multi-node; ReadOnlyMany (ROX)).
Reclaim Policy: What happens on PVC deletion (Retain: keep PV; Delete: destroy; Recycle: scrub).
YAML Example (Static PV): ```yaml apiVersion: v1 kind: PersistentVolume metadata: name: my-pv spec: capacity: storage: 10Gi accessModes:
- ReadWriteOnce persistentVolumeReclaimPolicy: Retain storageClassName: standard hostPath: path: /data ```
Nuances: Static (manual) vs dynamic (StorageClass provisions); bound to one PVC at a time.

4. PersistentVolumeClaim (PVC)

Definition: A Pod's request for storage, like a "storage ticket." It binds to a matching PV and is used in Pod specs.
Purpose: Decouples Pods from storage details; Pods request "10Gi RWO" without knowing the backend.
Key Fields:
Requests: Desired capacity/access modes.
StorageClassName: Matches PV's class for dynamic provisioning.
YAML Example: ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: my-pvc spec: accessModes:
- ReadWriteOnce resources: requests: storage: 10Gi storageClassName: standard ```
Usage in Pod/Deployment: ```yaml spec: volumes:
- name: persistent-storage persistentVolumeClaim: claimName: my-pvc # References PVC containers:
- name: app volumeMounts:
- name: persistent-storage mountPath: /data ```
Nuances: Namespace-scoped; unbound PVCs wait for PV; dynamic provisioning creates PV if no match.

5. StorageClasses

Definition: Defines storage "classes" (e.g., fast SSD vs cheap HDD) for dynamic provisioning. Acts as a template for PV creation.
Purpose: Abstracts storage backends (e.g., AWS EBS, GCE PD); enables policy-based provisioning.
Key Fields:
Provisioner: Backend driver (e.g., ebs.csi.aws.com).
Parameters: Options (e.g., volume type: gp3).
AllowVolumeExpansion: Resize PVCs.
Default: Marked for auto-use if unspecified.
YAML Example: yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast-ssd provisioner: ebs.csi.aws.com parameters: type: gp3 allowVolumeExpansion: true reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer # Delay binding until Pod schedules
Nuances: CSI (Container Storage Interface) drivers for modern backends; multiple classes for tiered storage.

Other Fundamental Storage Concepts

Access Modes: | Mode | Description | Example | |------|-------------|---------| | RWO | Read/Write by single node | EBS volumes | | RWX | Read/Write by multiple nodes | NFS, CephFS | | ROX | Read-only by multiple nodes | CD-ROM images |
Reclaim Policies (PV spec): | Policy | Effect on PVC Delete | |--------|----------------------| | Retain | PV persists; manual cleanup needed | | Delete | PV and storage destroyed | | Recycle | PV scrubbed and reused (deprecated) |
Dynamic Provisioning: StorageClass + provisioner auto-creates PVs when PVC requests match (e.g., unbound PVC triggers EBS volume creation).
Volume Expansion: Resize PVCs online (if StorageClass allows); e.g., kubectl edit pvc my-pvc → increase requests.storage: 20Gi.
CSI Drivers: Modern standard for storage plugins (e.g., AWS EBS CSI); replaces in-tree drivers.
Storage Ephemerality: Without PV/PVC, data lost on Pod restart; use for caches (emptyDir) vs databases (PV).

Flow

Create StorageClass (template).
Create PVC (request) → binds to PV (storage).
Pod/Deployment references PVC via volumes/volumeMounts.
Data persists across Pod restarts/nodes (if RWX).

Summary: PV = storage supply, PVC = demand, StorageClass = provisioning rules. Use for stateful apps; ephemeral volumes for temp data.

Custom Resources (CRs) in Kubernetes

Custom Resources (CRs) are user-defined extensions to the Kubernetes API that allow you to create your own objects (like Pod, Deployment) with custom behavior. They are the foundation of Kubernetes Operators and extensibility. Without CRs, you'd be limited to generic resources — forcing complex logic into ConfigMaps, annotations, or external systems.

What is a Custom Resource?

A Custom Resource is a user-defined object stored in Kubernetes etcd that extends the Kubernetes API.

Example: Instead of only managing Pod, you can define Database, Backup, GameServer, etc.
Analogy:

Built-in resources = int, string
Custom Resources = class Database { ... }

apiVersion: mycompany.com/v1
kind: Database
metadata:
  name: prod-db
spec:
  size: 100Gi
  engine: postgres

Core Concepts of Custom Resources

Concept	Explanation
1. CRD (Custom Resource Definition)	The schema that defines your new object type (like a database table schema).
2. Custom Resource (CR)	An instance of the CRD (like a row in the table).
3. API Group & Version	CRs live in custom API groups (e.g., `stable.example.com/v1`, `databases.mycompany.com/v1alpha1`).
4. Controller	A reconciler (usually in an Operator) that watches CRs and makes the world match the desired state.
5. Validation	OpenAPI v3 schema in CRD to enforce structure (e.g., `size > 0`).
6. Storage	CRs are stored in etcd just like built-in objects.
7. Namespacing	Can be namespaced or cluster-scoped.

1. CRD – The Blueprint

CRD YAML Structure

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.mycompany.com  # <plural>.<group>
spec:
  group: mycompany.com
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                size:
                  type: integer
                  minimum: 1
                engine:
                  type: string
                  enum: [postgres, mysql]
  scope: Namespaced  # or Cluster
  names:
    plural: databases
    singular: database
    kind: Database
    shortNames: [db]

Key Fields

Field	Purpose
`group`	Your domain (reverse DNS)
`versions`	Supports multiple (like `v1`, `v1beta1`)
`storage: true`	Only one version stores data
`scope`	`Namespaced` or `Cluster`
`names.kind`	The object type in YAML (`kind: Database`)
`shortNames`	CLI shortcuts (`kubectl get db`)

2. Custom Resource (CR) – The Instance

apiVersion: mycompany.com/v1
kind: Database
metadata:
  name: prod-postgres
  namespace: production
spec:
  size: 100
  engine: postgres
  backupPolicy: daily

Apply:

kubectl apply -f database.yaml

View:

kubectl get databases
kubectl get db prod-postgres -o yaml

3. Controller – The Brain (Reconciliation Loop)

A controller watches CRs and makes the actual state match the desired state.

Reconciliation Loop

1. Watch CR events (create/update/delete)
2. Read current state (from cluster)
3. Read desired state (from CR spec)
4. Compare
5. Take action (create PVC, deploy StatefulSet, etc.)
6. Update status
7. Repeat

4. Status Subresource

CRs have two parts: - .spec → desired state (input) - .status → observed state (output)

status:
  phase: Running
  replicas: 3
  conditions:
  - type: Ready
    status: "True"
    lastUpdate: "2025-04-05T10:00:00Z"

Controller owns .status, user owns .spec.

5. Validation & Defaulting

OpenAPI v3 Schema in CRD

schema:
  openAPIV3Schema:
    type: object
    required: [spec]
    properties:
      spec:
        type: object
        required: [size, engine]
        properties:
          size:
            type: integer
            minimum: 1
            maximum: 1000
          engine:
            type: string
            enum: [postgres, mysql]

Default Values (via Webhook) Use mutating webhook to set defaults:

spec:
  size: 10  → webhook sets to 50 if omitted

Real-World Examples

Project	CR	Purpose
Cert-Manager	`Certificate`	Auto TLS
ArgoCD	`Application`	GitOps sync
Prometheus Operator	`ServiceMonitor`	Auto scraping
Istio	`VirtualService`	Traffic routing
Crossplane	`PostgreSQLInstance`	Cloud DB provisioning

Kubernetes Ingress & Ingress Controllers

Ingress is not a built-in resource — it's an API object that defines HTTP(S) routing rules.
An Ingress Controller is the actual software (NGINX, Traefik, HAProxy, etc.) that reads Ingress objects and configures a reverse proxy.

Ingress = "Take HTTP/S traffic from outside the cluster and route it to the correct Service (and Pod) based on URL, host, path, and other rules — all declaratively."

# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$1
spec:
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-service
            port:
              number: 80
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls-secret

2. What is an Ingress Controller?

Component	Role
Ingress Resource	Declarative rules (YAML)
Ingress Controller	Reconciles rules → configures reverse proxy

Without a controller, Ingress does nothing.

3. Popular Ingress Controllers (2025)

Controller	Type	Key Features
NGINX Ingress	L7	High perf, rewrite, auth
Traefik	L7	Auto service discovery, middleware
HAProxy	L7/L4	TCP/UDP, enterprise
Istio Gateway	L7	mTLS, traffic splitting
Contour (Envoy)	L7	gRPC, observability
Gloo	L7	Function-level routing

4. How It Works – Step by Step

graph TD
    A[User: app.example.com/api] --> B[Load Balancer]
    B --> C[Ingress Controller Pod]
    C --> D[Reads Ingress YAML]
    D --> E[Configures NGINX/Traefik]
    E --> F[Routes to Service]
    F --> G[Pod]

User → app.example.com
Cloud LB → forwards to Ingress Controller
Controller watches Ingress objects
Generates config → reloads proxy
Routes to correct Service → Pod

5. Key Ingress Fields

Field	Purpose
`spec.rules[].host`	Virtual host (e.g., `api.example.com`)
`spec.rules[].http.paths[].path`	URL path (`/api`)
`pathType`	`Prefix`, `Exact`, `ImplementationSpecific`
`backend.service`	Target Service + port
`spec.tls[]`	TLS termination (secret with cert/key)
`metadata.annotations`	Controller-specific config

6. Path Types (Critical!)

Type	Behavior
`Prefix`	`/api` → `/api`, `/api/users`
`Exact`	`/api` only
`ImplementationSpecific`	Controller decides (NGINX: regex, Traefik: regex)

7. Real-World Example (NGINX)

# 1. Services
apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  selector:
    app: web
  ports:
  - port: 80

--- 

# 2. Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: main-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    cert-manager.io/cluster-issuer: "letsencrypt"
spec:
  ingressClassName: nginx  # Points to controller
  tls:
  - hosts: [app.example.com]
    secretName: app-tls
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web
            port:
              number: 80

8. IngressClass – Avoid Conflicts

An Ingress Class in Kubernetes is a resource that defines a specific Ingress controller to handle Ingress resources, allowing administrators to route traffic based on different controller capabilities and configurations.
It enables the use of multiple Ingress controllers—such as NGINX, Traefik, or HAProxy—within the same cluster by associating specific Ingress resources with a designated controller through the ingressClassName field in the Ingress manifest

apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: nginx
spec:
  controller: k8s.io/ingress-nginx# In Ingress
spec:
  ingressClassName: nginx

Multiple controllers? Use ingressClassName to route.

9. TLS Termination

Create Secret: yaml apiVersion: v1 kind: Secret metadata: name: app-tls type: kubernetes.io/tls data: tls.crt: base64(cert) tls.key: base64(key)
Auto-TLS with cert-manager: yaml annotations: cert-manager.io/cluster-issuer: "letsencrypt"

10. Advanced Features (Controller-Specific)

Feature	Annotation	Controller
Rate limiting	`nginx.ingress.kubernetes.io/limit-rps: "10"`	NGINX
Auth	`nginx.ingress.kubernetes.io/auth-url: ...`	NGINX
Canary	`nginx.ingress.kubernetes.io/canary-weight: "20"`	NGINX
Middleware	`traefik.ingress.kubernetes.io/router.middlewares: ...`	Traefik

11. Architecture Diagram

architecture

Summary Table

Component	Role
Ingress	YAML rules
Ingress Controller	Proxy (NGINX/Traefik)
IngressClass	Route to correct controller
Service	Backend target
Secret	TLS certs

Golden Rule:

Ingress = Rules
Ingress Controller = Engine
No controller = No routing

Kubernetes RBAC

Role-Based Access Control (RBAC) is Kubernetes’ default authorization system that controls who (user/service) can do what (verbs) on which resources in which namespace.

RBAC = "Who → Can do → What → Where"

2. Core RBAC Resources

Resource	Purpose
`Role` / `ClusterRole`	Define permissions (verbs on resources)
`RoleBinding` / `ClusterRoleBinding`	Bind permissions to users/groups/service accounts
`Subject`	Who gets access: `User`, `Group`, `ServiceAccount`

3. Role vs ClusterRole

	Role	ClusterRole
Scope	Namespaced	Cluster-wide
Use	`default` namespace only	All namespaces + cluster resources
Example	Edit Pods in `dev`	View Nodes cluster-wide

4. RoleBinding vs ClusterRoleBinding

	RoleBinding	ClusterRoleBinding
Binds	`Role` → subject in one namespace	`ClusterRole` → subject cluster-wide
Can bind	Only `Role`	`ClusterRole` or `Role` (namespaced)

5. Verbs (Actions)

Verb	Meaning
`get`	Read one resource
`list`	Read many
`watch`	Stream changes
`create`	Make new
`update` / `patch`	Modify
`delete`	Remove
`deletecollection`	Bulk delete

6. Resources & API Groups

Resource	API Group
`pods`, `services`	`""` (core)
`deployments`, `ingresses`	`apps`, `networking.k8s.io`
`nodes`, `persistentvolumes`	cluster-level
`*`	All resources

7. Full Example: Dev Can Edit Pods in `dev` Namespace

# 1. Role: What can be done
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: dev
  name: pod-editor
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list"]

# 2. RoleBinding: Who gets the role
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: dev-pod-access
  namespace: dev
subjects:
- kind: User
  name: alice
  apiGroup: rbac.authorization.k8s.io
- kind: ServiceAccount
  name: deployer-sa
  namespace: dev
roleRef:
  kind: Role
  name: pod-editor
  apiGroup: rbac.authorization.k8s.io

8. Cluster-Wide: View All Nodes

# ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-viewer
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "watch"]

# ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: node-viewer-global
subjects:
- kind: User
  name: monitoring-bot
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: node-viewer
  apiGroup: rbac.authorization.k8s.io

9. Built-in ClusterRoles (Use These!)

ClusterRole	Permissions
`cluster-admin`	Everything
`admin`	Most in a namespace
`edit`	Create/update most resources
`view`	Read-only

# Give admin in namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: namespace-admin
  namespace: staging
subjects:
- kind: User
  name: bob
roleRef:
  kind: ClusterRole
  name: admin
  apiGroup: rbac.authorization.k8s.io

10. Service Accounts & RBAC

A Service Account (SA) is a Kubernetes identity for non-human (applications, pods, processes) to authenticate and be authorized in the cluster.
A Service Account is scoped at namespace level.

# SA
apiVersion: v1
kind: ServiceAccount
metadata:
  name: backup-sa
  namespace: tools

# Bind to ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: backup-access
subjects:
- kind: ServiceAccount
  name: backup-sa
  namespace: tools
roleRef:
  kind: ClusterRole
  name: view
  apiGroup: rbac.authorization.k8s.io

Use in Pod:

spec:
  serviceAccountName: backup-sa

11. Testing RBAC

# Impersonate user
kubectl auth can-i create pods --as=alice -n dev
# → yes

kubectl auth can-i delete nodes --as=alice
# → no

12. Common Patterns

Goal	Use
Dev team edits in namespace	`Role` + `RoleBinding`
CI/CD deploys	`ServiceAccount` + `RoleBinding`
Monitoring reads all	`ClusterRole(view)` + `ClusterRoleBinding`
Admin per namespace	`ClusterRole(admin)` + `RoleBinding`

13. Best Practices

Practice	Why
Least privilege	Only needed verbs/resources
Use groups	`system:developers`
Avoid `cluster-admin`	Except for admins
Use ServiceAccounts	For apps, not users
Audit regularly	`kubectl get rolebindings -A`

Golden Rule:

Never give cluster-admin unless absolutely needed.
Always bind ClusterRole with RoleBinding for namespace isolation.

Kubernetes Monitoring

1. Why Monitor Kubernetes?

Need	What You Track
Reliability	Pod restarts, OOM kills
Performance	CPU, memory, latency
Capacity	Node saturation
Security	Anomalies, failed logins
SLOs	99.9% uptime

2. Core Monitoring Stack (2025 Standard)

Kubernetes
   ↓
cAdvisor (built-in) → Metrics Server → kube-state-metrics → Prometheus
   ↓
Grafana (dashboards) + Alertmanager + Kiali (Istio)

3. In-Built Components

Component	Role	Built-in?
`cAdvisor`	Collects container metrics (CPU, memory, disk, network)	Yes (in kubelet)
`Metrics Server`	Aggregates cAdvisor → `kubectl top`	Yes (installable)
`kube-state-metrics`	Exposes cluster state (Pods, Deployments, Nodes)	No (install)

4. Metrics Server – `kubectl top`

What It Does

Lightweight, in-memory aggregator of metrics from all cAdvisors
Enables: bash kubectl top nodes kubectl top pods -n prod

Limits

No long-term storage
No alerting
No custom metrics

5. Prometheus – The Gold Standard

Feature	Details
Pull-based	Scrapes `/metrics` endpoints from various sources
Time-series DB	Stores years of data
PromQL	Powerful query language
Service Discovery.

Key Targets

Target	Endpoint	Metrics
kubelet	`/metrics`, `/metrics/cadvisor`	Container CPU/memory
API server	`/metrics`	Request latency
Nodes	`10250`	System stats
kube-state-metrics	`/metrics`	Pod count, phase
Your app	`/metrics` (expose via client lib)	HTTP requests, errors

6. Grafana – Visualization

Dashboards for Prometheus
Pre-built: Node Exporter, Kubernetes Cluster, Apps
Alerting via Prometheus rules

# Example Panel
sum(rate(container_cpu_usage_seconds_total{namespace="prod"}[5m])) by (pod)

7. Kiali – Service Mesh Observability (Istio)

Feature	Use
Service Graph	Visual traffic flow
Metrics	Golden signals per service
Traces	Distributed tracing
Config Validation	Istio config errors

Only with Istio

8. Expose Application Metrics

Go Example

import "github.com/prometheus/client_golang/prometheus/promhttp"
http.Handle("/metrics", promhttp.Handler())

Python

from prometheus_client import start_http_server
start_http_server(8000)

Annotation (Auto-scrape)

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"

9. Alertmanager – Handle Alerts

# alert.rules
groups:
- name: node-alerts
  rules:
  - alert: NodeDown
    expr: up{job="node"} == 0
    for: 5m
    labels:
      severity: critical

Routes to Slack, PagerDuty, email.

10. Full Stack Overview

fullstack

11. Summary Table

Tool	Type	Must-Have?
cAdvisor	Container metrics	Yes (built-in)
Metrics Server	`kubectl top`	Yes
Prometheus	Storage + query	Yes
Grafana	Dashboards	Yes
Kiali	Service mesh	Yes (with Istio)
Alertmanager	Alerts	Yes

Golden Rule:

"If it’s not in Prometheus, it doesn’t exist."
Instrument everything. Alert on SLOs. Visualize trends.

Other Concepts

Annotations

Annotations are arbitrary key-value metadata attached to any Kubernetes object (Pod, Service, Deployment, etc.) — but they are NOT used for selecting or filtering.

metadata:
  annotations:
    app.kubernetes.io/version: "v1.2.3"
    prometheus.io/scrape: "true"
    backup.velero.io/backup-at: "2025-04-05T02:00:00Z"

Feature	Labels	Annotations
Purpose	Identify & select objects	Attach non-identifying metadata
Used by	`kubectl get pod -l app=web`	Not used in selectors
Size	Small, indexed	Up to 256KB
Example	`app: web`, `env: prod`	`description`, `contact`, `backup-policy`

Why Use Annotations?

Use Case	Example
Tooling Integration	`prometheus.io/scrape: "true"` → Prometheus auto-scrapes
Operators & Controllers	`helm.sh/hook: pre-install` → Helm runs job
Backup & Restore	`velero.io/exclude-from-backup: "true"`
Ingress Rules	`nginx.ingress.kubernetes.io/rewrite-target: /$1`
CI/CD Metadata	`build-id: 12345`, `git-commit: abc123`
Documentation	`owner: team-data@company.com`
Custom Automation	`reloader.stakater.com/auto: "true"` → ConfigMap reload

Real-World Examples

# 1. Prometheus
annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"

# 2. Helm
annotations:
  meta.helm.sh/release-name: my-app
  meta.helm.sh/release-namespace: prod

# 3. Cert-Manager
annotations:
  cert-manager.io/cluster-issuer: "letsencrypt"

# 4. Custom Operator
annotations:
  database.mycompany.com/backup-policy: daily

Best Practices

Do	Don’t
Use structured prefixes (`prometheus.io/`, `app.example.com/`)	Use random keys
Store non-identifying data	Use for selectors
Keep under 256KB	Store large logs
Use for automation hooks	Hardcode in code

How Tools Use Annotations

Tool	Reads Annotations For
Prometheus	Scraping config
Helm	Release tracking
ArgoCD	Sync waves
Kubelet	Pod behavior
Custom Controllers	Triggers, policies

Summary: - Labels = Who is this?
- Annotations = Extra info about this; metadata for tools and automation. - Not for filtering - Perfect for integration, hooks, and context

Istio

Istio = Service Mesh → Adds traffic control, security, observability to apps without code changes.

Core Architecture

Your App Pods
   ↓
Envoy Sidecar (auto-injected) to every container
   ↓
Istiod (Control Plane)

Envoy:
The Envoy proxy is deployed alongside each service instance as a sidecar container, intercepting all inbound and outbound traffic for that service.
This sidecar model allows Istio to enforce policies, collect telemetry, and manage traffic without requiring changes to the application code itself.
Istiod: Configures Envoy, certs, policies

1. Traffic Management

Feature	How
Path-based routing	`GET /api → api-v1`, `POST /api → api-v2`
Ratio-based (Canary)	`90% → v1`, `10% → v2`
Header-based	`x-user-type: beta → canary`
Fault Injection	Delay 2s, abort 5%
Timeouts/Retries	Auto retry on 5xx

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
spec:
  hosts: [api.example.com]
  http:
  - match:
    - uri: {prefix: /api}
      headers:
        x-user: {exact: beta}
    route:
    - destination: {host: api-v2, subset: v2}
      weight: 100
  - route:
    - {host: api-v1, subset: v1, weight: 90}
    - {host: api-v2, subset: v2, weight: 10}

2. mTLS Encryption (Mutual TLS)

Automatic between all services
Zero-trust: Every call encrypted + authenticated
Istiod issues short-lived certs (SPIFFE)

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
spec:
  mtls:
    mode: STRICT  # Enforce mTLS

3. Access Control (Authorization)

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
spec:
  action: ALLOW
  rules:
  - from:
      - source: {principals: ["cluster.local/ns/prod/sa/api"]}
    to:
      - operation: {methods: ["GET"], paths: ["/public/*"]}

4. Observability (Golden Signals)

Tool	What
Kiali	Service graph, health
Prometheus	Metrics (`istio_requests_total`)
Jaeger/Zipkin	Traces
Grafana	Dashboards

5. Key Resources

Resource	Purpose
`VirtualService`	Routing rules
`DestinationRule`	Subsets, load balancing, circuit breaker
`Gateway`	Ingress (L7 LB)
`ServiceEntry`	External services (e.g., api.google.com)
`PeerAuthentication`	mTLS mode
`AuthorizationPolicy`	RBAC for traffic

6. Example: Canary + mTLS + Auth

# 1. Subsets
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
spec:
  host: reviews
  subsets:
  - name: v1
    labels: {version: v1}
  - name: v2
    labels: {version: v2}

# 2. 90/10 routing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
spec:
  host: reviews
  http:
  - route:
    - {destination: {host: reviews, subset: v1}, weight: 90}
    - {destination: {host: reviews, subset: v2}, weight: 10}

# 3. Enforce mTLS
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
spec:
  mtls: {mode: STRICT}

7. Important Concepts

Concept	Meaning
Sidecar	Envoy injected into every Pod
Subset	Group of Pods by labels (e.g., `version: v2`)
Gateway	Ingress controller (replaces NGINX Ingress)
mTLS	End-to-end encryption
Circuit Breaker	Stop cascading failures
Fault Injection	Test resilience

Golden Rule:

Istio = Envoy + Istiod → Traffic, Security, Observability without app changes.

Use Istio when: - Microservices - Canary/Blue-Green - Zero-trust security - Multi-cluster

Skip if: - Simple apps - Monolith

Now route, secure, and observe your traffic like a pro!
Try: istioctl dashboard kiali

Kubernetes (K8s)

1. Why Kubernetes? The Problem Docker Solves… and Doesn’t

2. What Kubernetes Offers on Top of Docker

3. Kubernetes Architecture – Master vs Worker Nodes

MASTER NODE (Control Plane) – The Brain of K8s

1. kube-apiserver – The Front Door

2. etcd – The Database (Single Source of Truth)

3. kube-scheduler – The Matchmaker

Scoring Example

4. kube-controller-manager – The Robot Army

5. cloud-controller-manager

WORKER NODE – The Muscle

1. kubelet – The Node Agent

2. kube-proxy – The Network Cop

Service Types Handled

3. Container Runtime – The Engine

Real-World Flow

High Availability (HA) Setup

Summary Table

Kubernetes Resources

1. Pod – The Atomic Unit

Definition

Key Features

Kubernetes Probes

YAML Example

Use Cases

Pros/Cons

2. Deployment – The Workhorse for Stateless Apps

Definition

Key Features

YAML Example

Use Cases

Pros/Cons

Rolling Updates

3. Service – Load Balancer & Service Discovery

Definition

Types

YAML Example

Use Cases

Pros/Cons

How Services Identify Pods/Deployments

containerPort vs targetPort

In Service

Other Critical Fields for Service Discovery & Health Checks

YAML Snippet (Full Example)

4. DaemonSet – Run on Every Node

Definition

Key Features

YAML Example

Use Cases

Pros/Cons

5. Secrets – Secure Data Management

Definition

Key Features

Security Aspects of Secrets

Mounts

YAML Example

Use Cases

Pros/Cons

6. ConfigMap – Non-Sensitive Configuration

Definition

Key Features

Mounts

YAML Example

Use Cases

Pros/Cons

7. StatefulSet – For Stateful Apps

Definition

Stateful Workload

Key Features

YAML Example

Use Cases

Pros/Cons

8. Other Key Resources

ReplicaSet

Job & CronJob

Resource Relationships

Summary Table

Kubernetes Autoscalers: HPA vs VPA

Horizontal Pod Autoscaler (HPA)

1. `kube-apiserver` – The Front Door

2. `etcd` – The Database (Single Source of Truth)

3. `kube-scheduler` – The Matchmaker

4. `kube-controller-manager` – The Robot Army

5. `cloud-controller-manager`

1. `kubelet` – The Node Agent

2. `kube-proxy` – The Network Cop