Cache Invalidation Patterns
This guide sits under Advanced Caching Strategies & CDN Architecture and covers the hardest problem in caching: deciding what to evict, when, and how — without serving stale bytes or stampeding your origin.
Cache invalidation is a two-sided optimization. Push too little and users see stale content: a deployed bugfix that never reaches the edge, a price that changed an hour ago, a index.html that still references a deleted JS bundle. Push too much and you trigger thundering-herd revalidation — a single broad purge drops thousands of edge objects simultaneously, every subsequent request misses, and the origin absorbs a synchronized traffic spike that blows past its TTFB budget. The actionable boundary is concrete: cached static assets should hold a >85% edge hit ratio, cached TTFB should stay ≤50ms, and a deploy purge should propagate across PoPs in <2s while keeping the post-purge origin request rate inside the headroom your origin can serve under its TTFB ≤ 200ms target. This guide moves from baseline capture to root-cause isolation to a targeted purge strategy and CI validation.
Prerequisites: Versions, APIs, and Tagging Capability
Before tuning invalidation, confirm your stack exposes the primitives this workflow depends on:
- A CDN with tag-based purge: Fastly (
Surrogate-Keyheader +/service/<id>/purgeAPI), Cloudflare Enterprise (Cache-Tagheader +/purge_cachewithtags), or CloudFront (path-pattern invalidations only — no native tagging, plan accordingly). - An API token scoped to purge only, stored as a CI secret. Never reuse a full-access token in a deploy job.
- A build that emits content-hashed filenames for JS/CSS (Vite
assetFileNames/chunkFileNames/entryFileNames, or Webpack[contenthash]), so static assets are invalidated by omission rather than by purge. curl7.x andjqfor header inspection in CI.- For the client tier: a Service Worker (Workbox 7+ or hand-rolled) whose precache manifest is regenerated on every build.
With those in place, the remaining work is choosing the right purge primitive per asset class and wiring it into the deploy.
1. Environment Setup: Tag Responses with Surrogate Keys
Invalidation granularity is decided at response time, not purge time. The origin must stamp each cacheable response with the keys that identify the content inside it. A product page response might carry product-42, category-shoes, and layout-v3 — purging any one of those keys later evicts this object.
# Origin response for /products/42 — emit surrogate keys at write time
location ~ ^/products/(\d+)$ {
add_header Surrogate-Key "product-$1 category-shoes layout-v3";
add_header Cache-Control "public, max-age=0, s-maxage=86400";
# trade-off: max-age=0 keeps the browser revalidating while the edge
# holds it for a day. Do NOT use this for assets you cannot purge by tag —
# without tag purge you'd be stuck serving stale content for the full day.
proxy_pass http://backend;
}
The header name is CDN-specific: Fastly reads Surrogate-Key, Cloudflare reads Cache-Tag. Most edges strip the header before it reaches the browser, so it costs nothing on the wire. The discipline that matters: tag by content identity and dependency, not by URL. A URL is one address; a tag can span hundreds of URLs that share a dependency, which is exactly what makes selective purge possible. Header placement and edge cache-key alignment are covered in depth in CDN Edge Caching Configuration.
2. Capture Baseline: Measure Hit Ratio and Post-Purge Origin Load
You cannot tell whether a purge strategy is over- or under-firing without numbers. Capture three baselines before changing anything.
# Baseline edge behaviour for a representative asset
curl -sI https://your-domain.com/products/42 \
| grep -Ei "cache-control|age|x-cache|cf-cache-status|surrogate-key"
# trade-off: a one-shot curl shows steady-state freshness but NOT the
# origin spike a broad purge causes — for that, watch origin RPS during a
# staging purge instead of trusting a single header read.
Record: edge hit ratio (CDN analytics, target >85%), Age header progression on repeat requests (confirms the edge is actually caching), and — critically — origin requests-per-second in the 60 seconds after a test purge. That last number is your thundering-herd indicator. If a deploy purge drives origin RPS past what it serves inside TTFB ≤ 200ms, your purge is too broad or lacks a stale-serving cushion.
3. Isolate the Bottleneck: Choose the Purge Primitive
Every invalidation maps to one of four primitives. Picking the wrong one is the root cause of nearly every stale-content or origin-spike incident.
| Primitive | Blast radius | When to use | Failure mode |
|---|---|---|---|
| Purge by URL | One object | A single known page/asset changed | Misses variants (query strings, Vary permutations) → under-purge |
| Purge by tag / surrogate key | All objects carrying the key | Content with shared dependencies (a product across listings, search, sitemap) | Over-broad tags evict more than intended → origin spike |
| Purge everything | Entire cache | Cache-key bug, security incident, last resort | Guaranteed thundering herd; never in a routine deploy |
| Invalidation by omission | Nothing purged | Hashed immutable assets — new hash = new URL | Stale HTML still referencing old hashes → broken page |
The decision rule: hashed static assets use omission, content uses tags, single ad-hoc fixes use URL, and purge-everything is a break-glass. Aligning hashed-asset headers so omission works correctly is detailed in setting up immutable cache headers for hashed assets.
4. Apply the Fix: Soft Purge with Stale-While-Revalidate
A hard purge deletes the object immediately — the next request is a guaranteed miss. A soft purge marks the object stale but keeps it on disk, letting the edge serve the stale copy once while it revalidates against origin in the background. Combined with stale-while-revalidate, soft purge converts a synchronized miss storm into a smooth, lazy refresh.
# Fastly soft purge by surrogate key — evict product-42 across all URLs
curl -X POST "https://api.fastly.com/service/$FASTLY_SERVICE_ID/purge/product-42" \
-H "Fastly-Key: $FASTLY_PURGE_TOKEN" \
-H "Fastly-Soft-Purge: 1"
# trade-off: soft purge serves ONE stale response per object during
# revalidation. Do NOT soft-purge a security-sensitive change (leaked
# token, wrong price) where even a single stale hit is unacceptable —
# use a hard purge there and eat the brief origin load.
For this to be safe, the cached response must declare a stale window:
Cache-Control: public, s-maxage=86400, stale-while-revalidate=600, stale-if-error=86400
# trade-off: the 600s SWR window absorbs the post-purge refetch wave, but
# means a soft-purged object can serve stale for up to 10 minutes under
# load. Shrink the window for fast-moving content; widen it for catalogs.
The stale-while-revalidate mechanics — how the edge counts the window and how it interacts with s-maxage — are covered in Stale-While-Revalidate Implementation.
Deconstructing Invalidation Latency into Phases
"Did my purge work?" decomposes into measurable phases, each with its own budget and its own failure mode. Treat them like timing phases on a Core Web Vital: find the dominant one and fix it first.
- API acceptance (≤200ms): time for the CDN to acknowledge the purge call. A slow or rate-limited API here means your deploy job blocks or times out. Batch keys to stay under per-request limits.
- Edge propagation (<2s): time for the eviction to reach all PoPs. Fastly propagates globally in roughly 150ms; Cloudflare and CloudFront are slower and eventually-consistent. If you read a header from a near PoP it may already be fresh while a far PoP still serves stale.
- Origin refetch (bounded by SWR window): the lazy refill. This is where thundering herd lives. Without soft purge + SWR, this phase collapses into a synchronized spike; with them, it spreads across the window.
- Client convergence (variable): browser HTTP cache and Service Worker caches do not see edge purges at all. They hold their own copies until
max-agelapses or the SW updates its manifest.
The last phase is the one teams forget: an edge purge never reaches the client. Coordinating the SW tier is the subject of the advanced diagnostics below.
Advanced Diagnostics: Coordinating Origin, Edge, and Service Worker Caches
A purge that fixes the edge but leaves the Service Worker serving a stale precached index.html produces the most confusing class of bug — it reproduces only for returning users and is invisible to curl. There are three caches in the chain, and a purge primitive only touches one of them.
// On SW activation, drop precaches that don't match the new build's revision
self.addEventListener('activate', (event) => {
const currentCaches = [`precache-${BUILD_REVISION}`];
event.waitUntil(
caches.keys().then((keys) =>
Promise.all(
keys
.filter((key) => !currentCaches.includes(key))
.map((key) => caches.delete(key))
)
).then(() => self.clients.claim())
);
// trade-off: clients.claim() forces the new SW to control open tabs
// immediately, but mid-session asset swaps can mix old HTML with new
// chunks. Skip claim() if your app cannot tolerate a live version flip.
});
The coordination rule across tiers:
- Origin is invalidated by deploying new content (and tagging it).
- Edge is invalidated by tag/URL/soft purge as chosen in step 3.
- Service Worker is invalidated by regenerating the precache manifest with a new revision per build, plus the cleanup above. The HTML it serves must point at the new hashes the moment the edge does.
When the SW intercepts hashed assets, ensure its fetch handler does not shadow the edge's immutable responses — the bypass pattern is detailed in Service Worker Caching Strategies. The most common production incident here is over- or under-purging on deploy, which has its own runbook: purging CDN cache by tag on deploy.
Validation & Budgeting: Assert Purge Behavior in CI
Invalidation correctness should fail the pipeline, not the user. Two assertions belong in every deploy: that the freshly deployed page is actually served fresh from the edge, and that the deploy did not trip a purge-everything fallback.
#!/usr/bin/env bash
# ci/verify-purge.sh — run after the deploy's purge step
set -euo pipefail
URL="https://your-domain.com/products/42"
# 1. First request after purge should MISS (proves the purge landed)
status=$(curl -sI "$URL" | grep -i 'x-cache' | tr -d '\r')
echo "post-purge: $status"
echo "$status" | grep -qiE 'miss|expired' || { echo "FAIL: object still cached, purge did not land"; exit 1; }
# 2. Second request should HIT within a second (proves edge re-caches, no per-request origin load)
sleep 1
curl -sI "$URL" | grep -i 'x-cache' | grep -qiE 'hit' || { echo "FAIL: edge not re-caching after purge"; exit 1; }
# trade-off: this asserts correctness on ONE representative URL. Do NOT
# assert it on hundreds in CI — that itself becomes a load test. Sample
# one URL per tag class and trust tag semantics for the rest.
Set explicit budgets and alert on regressions: edge hit ratio >85% sustained, cached TTFB ≤50ms, post-deploy origin RPS within origin headroom, and purge-to-fresh propagation <2s. Wire the script above into the same CI stage that runs your header audits so a purge that silently fails to land blocks the release rather than reaching production.
Related
- CDN Edge Caching Configuration — set cache keys and TTLs so tags and surrogate keys resolve to the objects you expect.
- HTTP Cache-Control Headers Explained — the directive precedence (
s-maxage,immutable,stale-while-revalidate) that purge strategies build on. - Stale-While-Revalidate Implementation — the stale-serving window that turns a soft purge into a smooth refill.
- Service Worker Caching Strategies — invalidating the client tier that edge purges can never reach.
- Purging CDN cache by tag on deploy — the deploy-time runbook for over- and under-purging.