Optimistic Concurrency and Other controller-runtime Gotchas

March 18, 2026

Optimistic Concurrency and Other controller-runtime Gotchas
Optimistic Concurrency and Other controller-runtime Gotchas

The Kubernetes API uses optimistic concurrency control. If you don’t understand resourceVersion, your controller will fight itself.

Estimated Reading Time : 9m

How optimistic concurrency works

Every Kubernetes resource has a .metadata.resourceVersion field — an opaque string that changes on every write. When you update a resource, the API server checks that the resourceVersion you’re sending matches the current one. If it doesn’t, the update is rejected with a 409 Conflict.

Controller A: Get resource (resourceVersion: "100")
Controller B: Get resource (resourceVersion: "100")
Controller A: Update resource (resourceVersion: "100") → succeeds, now "101"
Controller B: Update resource (resourceVersion: "100") → CONFLICT (stale)

This is “optimistic” because it assumes conflicts are rare. Instead of locking the resource before reading, it lets everyone read freely and rejects stale writes.

The conflict error

In controller-runtime, a conflict looks like this:

if err := r.Update(ctx, &backup); err != nil {
    if errors.IsConflict(err) {
        // Someone else modified the resource — requeue and retry
        return ctrl.Result{Requeue: true}, nil
    }
    return ctrl.Result{}, err
}

Conflicts are normal and expected. They’re not errors — they’re the system working correctly. The right response is to requeue and re-read the resource.

Why conflicts happen

The most common causes:

Your controller updates status, which triggers another reconcile that tries to update spec. The second reconcile read the resource before the status update completed, so its resourceVersion is stale. Fix: use the status subresource.

Multiple controllers manage the same resource. Two controllers both try to add finalizers or update labels. One wins, the other conflicts. Fix: requeue on conflict.

The cache is behind the API server. You read from the cache (old resourceVersion), then update. Between your read and update, someone else modified the resource. Fix: requeue on conflict — the cache will catch up.

Patch vs Update

Update sends the entire resource and requires a matching resourceVersion. Patch sends only the diff and is less prone to conflicts.

patch := client.Apply
backup.Spec.RetentionDays = 14

if err := r.Patch(ctx, &backup, patch, client.ForceOwnership, client.FieldOwner("backup-controller")); err != nil {
    return ctrl.Result{}, err
}

Server-side apply tracks field ownership per controller. Two controllers can manage different fields of the same resource without conflicting.

Merge Patch

original := backup.DeepCopy()
backup.Spec.RetentionDays = 14

if err := r.Patch(ctx, &backup, client.MergeFrom(original)); err != nil {
    return ctrl.Result{}, err
}

Sends only the changed fields. Less likely to conflict than Update, but still requires a valid resourceVersion.

When to use which

  • Patch (MergeFrom) — when you’re modifying specific fields and want to reduce conflicts
  • Server-side Apply — when multiple controllers manage different fields of the same resource
  • Update — when you need to replace the entire spec (rare)

Common gotchas

Modifying the object from Get and reusing it

// BAD: modifying the cached object
var backup v1alpha1.Backup
r.Get(ctx, req.NamespacedName, &backup)
backup.Spec.RetentionDays = 14 // modifies the cached copy!
r.Update(ctx, &backup)

Get returns a pointer to the cached object. Modifying it directly corrupts the cache. Always DeepCopy before modifying:

// GOOD: work on a copy
var backup v1alpha1.Backup
r.Get(ctx, req.NamespacedName, &backup)
modified := backup.DeepCopy()
modified.Spec.RetentionDays = 14
r.Update(ctx, modified)

Infinite reconcile loops

The most common controller bug. Symptoms: your controller’s CPU is high, and logs show constant reconciliation with no changes. Causes:

  1. Updating status without the status subresource — every status update changes generation, which triggers a reconcile
  2. Updating the resource in every reconcile — even when nothing changed. Add a check:
if backup.Spec.RetentionDays != desiredRetention {
    backup.Spec.RetentionDays = desiredRetention
    if err := r.Update(ctx, &backup); err != nil {
        return ctrl.Result{}, err
    }
}
  1. Missing GenerationChangedPredicate — use it to ignore status-only updates

Not handling NotFound

When a resource is deleted, your reconciler runs one more time. The Get call returns NotFound. If you don’t handle it, you log an error on every delete:

// Always handle NotFound
if err := r.Get(ctx, req.NamespacedName, &backup); err != nil {
    return ctrl.Result{}, client.IgnoreNotFound(err)
}

Assuming create vs update

Don’t branch on “is this a create or an update?” The reconciler doesn’t receive event types. Instead, check the state:

// BAD: trying to determine event type
// (you can't — the reconciler doesn't know)

// GOOD: check the state and act accordingly
var deployment appsv1.Deployment
err := r.Get(ctx, deploymentKey, &deployment)
if errors.IsNotFound(err) {
    // Deployment doesn't exist — create it
    return r.createDeployment(ctx, &backup)
}
if err != nil {
    return ctrl.Result{}, err
}
// Deployment exists — update if needed
return r.updateDeployment(ctx, &backup, &deployment)

Requeuing too aggressively

// BAD: always requeue
return ctrl.Result{RequeueAfter: 5 * time.Second}, nil

If your controller always requeues, it reconciles constantly even when nothing changes. Only requeue when you’re waiting for something:

// GOOD: requeue only when work is pending
if backup.Status.Phase == "Running" {
    return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}
return ctrl.Result{}, nil

The mindset

Conflicts, cache staleness, and requeues are normal. A well-written controller handles them gracefully. The reconciler is always catching up to reality — it doesn’t need to be right on the first try, it just needs to converge.