Where the Problem Started

A PodDisruptionBudget can sound like a hard guarantee. But Kubernetes can still evict lower-priority pods to make room for more important ones, and drain behavior can produce results that surprise operators. This post examines the false sense of safety that appears when PriorityClass and PDB are used together.

When using Kubernetes, it is common to configure PriorityClass and PodDisruptionBudget(PDB) together. PriorityClass is a mechanism that says, “this pod is important,” by assigning priority, and PDB is a constraint that says, “at least this many must stay alive.”

At first, it is easy to think this way.

“Since I configured a PDB, the minimum count should be guaranteed, right?”

But in real operations, pods can disappear even when a PDB exists. Especially in scheduler preemption situations or node drain operations, PDB is not absolutely guaranteed. In this post, I will summarize why this happens, how it is handled in the scheduler architecture and at the code level, and how to verify it with actual examples.

The Roles of PriorityClass and PDB

PriorityClass

Pods with a high PriorityClass value are treated as more important by the scheduler. If cluster resources are insufficient, the scheduler forcibly evicts lower PriorityClass pods in order to run higher PriorityClass pods. This process is called Preemption.

PDB (PodDisruptionBudget)

It prevents too many pods from going down at the same time during voluntary disruptions such as drain or rolling updates. For example, if minAvailable: 2 is set, at least two pods must stay alive. However, situations like node failure or preemption are not guaranteed.

How PDB Handling Looks Inside the Scheduler Code

The scheduler does not actually ignore PDB completely. If you look at pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go, there is code that checks PDBs when selecting victims:

// SelectVictimsOnNode finds minimum set of pods on the given node that should be preempted in order to make enough room
// for "pod" to be scheduled.
func (pl *DefaultPreemption) SelectVictimsOnNode(
	ctx context.Context,
	state fwk.CycleState,
	pod *v1.Pod,
	nodeInfo fwk.NodeInfo,
	pdbs []*policy.PodDisruptionBudget) ([]*v1.Pod, int, *fwk.Status) {
	logger := klog.FromContext(ctx)
	var potentialVictims []fwk.PodInfo
	removePod := func(rpi fwk.PodInfo) error {
		if err := nodeInfo.RemovePod(logger, rpi.GetPod()); err != nil {
			return err
		}
		status := pl.fh.RunPreFilterExtensionRemovePod(ctx, state, pod, rpi, nodeInfo)
		if !status.IsSuccess() {
			return status.AsError()
		}
		return nil
	}
	addPod := func(api fwk.PodInfo) error {
		nodeInfo.AddPodInfo(api)
		status := pl.fh.RunPreFilterExtensionAddPod(ctx, state, pod, api, nodeInfo)
		if !status.IsSuccess() {
			return status.AsError()
		}
		return nil
	}
	// As the first step, remove all pods eligible for preemption from the node and
	// check if the given pod can be scheduled without them present.
	for _, pi := range nodeInfo.GetPods() {
		if pl.isPreemptionAllowed(nodeInfo, pi, pod) {
			potentialVictims = append(potentialVictims, pi)
			if err := removePod(pi); err != nil {
				return nil, 0, fwk.AsStatus(err)
			}
		}
	}

In other words, the scheduler respects PDBs when it can. But if every viable path requires breaking a PDB, Kubernetes can still choose preemption and evict the lower-priority pod.

The official documentation also says this:

PodDisruptionBudget is supported, but not guaranteed. The scheduler tries not to violate PDB when possible, but if it cannot find an alternative victim, it removes lower-priority pods even if that breaks PDB.

Kubernetes Docs

Example Where PDB Is Broken During Preemption

  1. Create PriorityClass
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 1000
globalDefault: false
description: "낮은 우선순위 파드"

---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 100000
globalDefault: false
description: "높은 우선순위 파드"
  1. Create Deployment + PDB
apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-deploy
spec:
  replicas: 3
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      priorityClassName: low-priority
      containers:
      - name: nginx
        image: nginx

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: demo-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: demo
  1. Add a high PriorityClass pod
apiVersion: v1
kind: Pod
metadata:
  name: important-pod
spec:
  priorityClassName: high-priority
  containers:
  - name: busy
    image: busybox
    command: ["sh", "-c", "sleep 3600"]
    resources:
      requests:
        cpu: "500m"
        memory: "512Mi"

If cluster resources are insufficient, the scheduler sacrifices one of the demo-deploy pods to run important-pod. During this process, the PDB(minAvailable: 2) condition can be broken.

If you run kubectl describe pdb demo-pdb, CurrentHealthy may have dropped to 1 instead of 2.

Example Where PDB Is Broken During Drain

Now test the same assumption during drain.

  1. drain-demo Deployment and PDB
apiVersion: apps/v1
kind: Deployment
metadata:
  name: drain-demo
spec:
  replicas: 3
  selector:
    matchLabels:
      app: drain-demo
  template:
    metadata:
      labels:
        app: drain-demo
    spec:
      priorityClassName: low-priority
      containers:
      - name: nginx
        image: nginx

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: drain-demo-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: drain-demo
  1. Run node drain
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
  1. Check the result

If you run kubectl get pods, you may see that only one pod is left.

Result of kubectl describe pdb drain-demo-pdb:

Status:
  Current Healthy:   1
  Desired Healthy:   2
  Disruptions Allowed: 0

In other words, during the drain process, the PDB condition is broken and pods disappear.

Why Does This Happen?

PDB is a drain/update protection mechanism, but not an absolute guarantee

Even though drain itself is a voluntary disruption, there are cases where the PDB condition cannot be fully honored while emptying a node.

Asynchronous controller structure

Because the PDB controller and scheduler run separately, if the timing of status updates is off, the scheduler can make decisions based on stale information.

A philosophical choice

Kubernetes follows the philosophy of “while preserving service availability as much as possible, running the more important pod comes first.” That is why PDB violations are allowed.

Things To Watch In Operations

  • PDB is not an absolute shield. For truly important workloads, you should raise PriorityClass and use PDB as a supporting mechanism.

  • A drain strategy is necessary. In production environments, you need to design deployment/update strategy on the assumption that kubectl drain may ignore PDB.

  • Monitoring is mandatory. You should periodically check disruptionsAllowed and currentHealthy with kubectl get pdb.

  • Practice beforehand. Intentionally trigger preemption and drain in a controlled environment before relying on the policy in production.

Summary

  • PriorityClass determines pod priority and removes lower PriorityClass pods when needed.
  • PDB limits voluntary disruptions, but it is not guaranteed in situations such as preemption or drain.
  • Kubernetes’ basic philosophy is “run important pods first,” and breaking PDB is also allowed in that process.
  • For pods you really want to protect, design them with a PriorityClass + PDB combination and think through the drain strategy as well.

Availability Takeaway

PDB is an important safety mechanism, but not an absolute shield. Scheduling, preemption, drain, and controller update timing all affect the real availability story. For important workloads, PDB needs to be designed together with priority, replica placement, rollout strategy, and monitoring.