Advanced Caching Strategies & CDN Architecture: Engineering Sub-200ms TTFB
Caching is the single highest-leverage lever on two numbers that decide whether a frontend feels fast: Time to First Byte and repeat-visit Largest Contentful Paint. When the document and its critical sub-resources are served from a cache that sits physically and logically close to the user, TTFB collapses toward the network round-trip floor (the actionable target is TTFB ≤ 200ms at p75) and a returning visitor can paint their Largest Contentful Paint element with zero blocking network fetches. Miss that, and every other optimization — bundle splitting, image compression, hydration scheduling — fights uphill against a slow first byte and cold-cache fetches.
This is a layered problem, not a single header. A request can be satisfied by the HTTP disk cache in the browser, intercepted by a Service Worker, served from a CDN edge node, or fall all the way through to origin. Each layer has different freshness semantics, different invalidation costs, and different failure modes. The job of a caching architecture is to make the fast path the common path while keeping a deterministic, low-blast-radius way to ship new content. This page frames the diagnostic numbers, walks the three architectural layers — HTTP cache-control headers and edge caching, Service Worker and client caches, and freshness with stale-while-revalidate — then covers monitoring, reference configs, pitfalls, and FAQs.
Diagnostic Overview: Reading HIT/MISS Ratios Before You Touch a Header
Caching changes should never start with a config edit. They start with measurement, because the lever you pull depends entirely on where the misses are concentrated. The first signal is the cache HIT/MISS ratio, and it has to be read per layer and per asset class — a 95% aggregate hit ratio can hide a 40% miss rate on the HTML document that gates TTFB for every first paint.
At the edge, every mature CDN emits a status header (CF-Cache-Status, X-Cache, Fastly-Debug, or similar). Sample it in your CDN logs and segment by route: static hashed assets should sit above 95% HIT, document responses depend on your freshness model, and /api/* may be deliberately uncacheable. A MISS followed by HIT is normal cold-cache warming; a sustained MISS or EXPIRED/REVALIDATED churn on assets you expect to be immutable points at a header bug or a cache-busting query string.
At the browser, the field signal is the PerformanceResourceTiming entry. A transferSize of 0 with a non-zero decodedBodySize is a disk/memory cache hit; deliveryType === "cache" (where supported) confirms it. This is how you measure repeat-visit behavior in production rather than guessing from a warm DevTools session.
Then separate edge vs origin. TTFB measured at the browser is the sum of DNS, connection, edge processing, and — on a miss — the full origin fetch. WebPageTest run in cold-cache and warm-cache modes isolates the two: if cold TTFB is 600ms and warm is 40ms, your origin is slow and your edge cache is doing its job; if both are 600ms, the edge is missing and you have a header or key problem. The same split applies to repeat-visit LCP — a warm browser cache should make the second view's LCP element paint with no blocking fetch, so if warm-load LCP barely improves over cold, the browser layer is not caching the critical resources and the cause is almost always a missing or too-short max-age on the LCP image, font, or hero stylesheet.
Finally, anchor everything in RUM. Lab numbers locate the bottleneck, but the p75 field TTFB and p75 repeat-visit LCP across real PoPs, devices, and networks are what actually ship. A lab session on a fast machine over a warm connection will systematically understate the misses your real users hit out in the slow tail of the device and network distribution. Aggregate cache status by CDN PoP and device tier so you can tell a single failing region from a global regression — a cache that is healthy in your home region but cold in a distant PoP is invisible in lab data and obvious in field p95. Read p50 to confirm the common path is fast, p75 because that is the boundary Core Web Vitals scores against, and p95 to catch the failing-region and quota-eviction cases that p75 smooths over.
Architecture Layer 1: HTTP Cache-Control Headers and the CDN Edge
The network layer is where the largest TTFB wins live, because a request answered at the edge never pays origin latency. The contract that governs it is Cache-Control, and the discipline is matching directive to asset class. Get this right and the edge caching configuration does the heavy lifting; get it wrong and you either serve stale HTML or hammer origin on every request.
The clean split is content-addressable static assets vs. mutable documents. Hashed bundles (app-9f3a2c.js) can never change content under a stable name, so they earn public, max-age=31536000, immutable — a one-year freshness window with no revalidation, which is exactly what immutable cache headers for hashed assets are designed for. The HTML document that references them is mutable and gets a short or zero max-age plus a validator (ETag or Last-Modified), so a new deploy is picked up promptly while still allowing conditional 304 Not Modified responses.
Two directives separate amateurs from production setups. s-maxage lets you give the shared edge cache a different (usually longer) freshness window than the browser, so the edge absorbs traffic while clients revalidate more often. And the stale-* family — stale-while-revalidate and stale-if-error — turns the edge from a binary fresh/expired gate into a graceful one: it can serve a slightly stale body instantly and refresh in the background, or keep serving cached content when origin returns a 5xx, the focus of configuring stale-if-error for origin outages.
The other edge concern is the cache key. By default the key is the URL, but Vary expands it. Vary: Accept-Encoding is correct and necessary so the edge stores Brotli and gzip variants separately. Vary: User-Agent is almost always a mistake — it fragments the key across thousands of UA strings and craters your hit ratio. Auth-sensitive or personalized responses need an explicit, narrow key dimension (a normalized cookie or a custom header), never the raw Cookie header, which fragments per session. The practical test is cardinality: any header you add to the key multiplies the number of distinct cached objects per URL by its number of distinct values, so a two-value Accept-Encoding is fine and a high-cardinality User-Agent or session cookie is fatal. When you genuinely must vary on something high-cardinality — a logged-in vs. anonymous split, say — normalize it to a low-cardinality derived value at the edge (a single is-authenticated boolean) before it enters the key.
It is also worth distinguishing the two ways the edge keeps freshness honest. max-age/s-maxage is expiration-based: the object is fresh until its timer runs out, no origin contact needed. A validator (ETag/Last-Modified) is validation-based: when an object expires the edge sends a conditional request, and origin answers with a cheap 304 Not Modified if nothing changed, saving the body transfer but not the round trip. The combination — short expiration plus a validator plus stale-while-revalidate — gives you the best of all three: most requests are served fresh from the edge with no origin contact, expired ones are served stale-but-instant while a background revalidation runs, and the revalidation itself is a cheap 304 when content is unchanged.
# Origin headers consumed by the CDN. Hashed assets vs. the mutable document.
location ~* "-[0-9a-f]{6,}\.(js|css|woff2|png|jpg|svg|avif|webp)$" {
add_header Cache-Control "public, max-age=31536000, immutable";
add_header Vary "Accept-Encoding"; # store br/gzip variants separately
add_header X-Content-Type-Options "nosniff";
access_log off;
}
location = /index.html {
# trade-off: s-maxage=300 lets the edge absorb traffic for 5 min, but a hotfix
# that must appear instantly needs an explicit purge (see invalidation below) —
# do NOT raise this for content that ships on a human-urgent timeline.
add_header Cache-Control "public, max-age=0, s-maxage=300, stale-while-revalidate=60, stale-if-error=86400";
etag on;
}
Architecture Layer 2: Service Worker and Client-Side Caches
The Service Worker is the only layer you fully control on the client, and it is what makes a repeat visit feel instant and an offline visit possible at all. It sits between the page and the network, intercepting fetch events and answering from the Cache API. The architectural decision is per-request: which Service Worker caching strategy applies to which destination.
The mapping that holds for most apps: cache-first for content-hashed static assets (they are immutable, so the cache is always correct and you skip the network entirely), network-first with a cache fallback for navigation/HTML (you want the latest document but a cached one beats a connection-error page), and stale-while-revalidate for things like avatars or non-critical JSON where instant-but-slightly-stale is the right trade. Choosing between the first and last for an app shell is the exact subject of SWR vs cache-first Service Worker for React SPAs.
Three operational constraints decide whether this layer helps or hurts INP and reliability. First, respond fast: register the fetch listener early and keep the routing logic before event.respondWith() trivial — heavy synchronous work there adds latency to every request, and because the SW thread is shared, a slow handler delays every concurrent fetch, not just one. Second, bound storage: the Cache API shares an origin quota with IndexedDB and other storage, so apply explicit expiration (max entries + max age) and check navigator.storage.estimate(), or you will eventually hit QuotaExceededError and the browser will silently evict entries you assumed were permanent — a cache-first asset that has been evicted becomes a silent network fetch you never see in the SW logs. Third, manage the lifecycle: a bad SW can pin a stale app shell across deploys because the old worker keeps controlling open clients until they all navigate away. Version your cache names, clean old caches in activate, decide deliberately between skipWaiting() (immediate but risks mismatched assets in an open tab) and waiting for natural client turnover, and have a story for when a cache returns nothing — diagnosing that gap is the focus of debugging Service Worker cache misses in production.
// Per-destination routing. cache-first for immutable assets, network-first for navigations.
self.addEventListener('fetch', (event) => {
const { request } = event;
if (/-[0-9a-f]{6,}\.(js|css|woff2)$/.test(new URL(request.url).pathname)) {
// trade-off: cache-first is only safe for content-hashed names. Apply it to a
// non-hashed URL and users will be stuck on stale code until the cache is purged.
event.respondWith(
caches.match(request).then((hit) => hit || fetch(request).then((res) => {
const copy = res.clone();
caches.open('static-v3').then((c) => c.put(request, copy));
return res;
}))
);
} else if (request.mode === 'navigate') {
event.respondWith(
fetch(request).catch(() => caches.match('/offline.html'))
);
}
});
Architecture Layer 3: Invalidation and Freshness with Stale-While-Revalidate
The hardest part of caching is not storing things — it is knowing when a stored thing is wrong. Aggressive TTLs and immutable only pay off if you have a deterministic way to ship new content without serving a single stale byte. That is the role of an explicit invalidation model, the subject of the cache invalidation patterns guide.
The foundation is content-addressable hashing: a build emits filenames that change if and only if their bytes change. New deploys produce new asset URLs, so old and new can coexist in every cache with no conflict, and you never need to purge an asset by name. The only thing that must be invalidated on deploy is the mutable HTML entry point that references the new hashes — a tiny, surgical purge instead of a directory-wide one. The subtle failure here is editing a file's content without its hash changing (bad build config, hand-edited dist), covered in invalidating immutable hashed assets safely.
For everything that isn't content-addressable — HTML, API responses, fragments — you need targeted invalidation, and the production-grade tool is tag/surrogate-key purging. You attach a Surrogate-Key (or Cache-Tag) header listing the logical entities a response depends on (product-123, homepage), then purge by tag when those entities change. One write invalidates exactly the responses that referenced it, across every PoP, with no over-purge stampede — the deploy-time workflow in purging CDN cache by tag on deploy. The mental model worth internalizing: a URL purge says "this address is stale," while a tag purge says "everything that depended on this fact is stale," and the second matches how content actually changes — one product edit can touch a listing page, a search result, and a sitemap, and a single tag purge invalidates all three without your having to enumerate their URLs. The cost is discipline at write time: every response must carry the complete set of tags for the entities it embeds, because an untagged dependency is a stale response that no purge will ever reach.
Layered on top is stale-while-revalidate, which reframes freshness from a hard cutoff to a soft one. Within the max-age the response is fresh. Within the trailing stale-while-revalidate window it is served immediately while a background fetch refreshes the cache, so the user never waits on revalidation and TTFB stays flat even as content updates. Deciding whether to drive this with the SWR Cache-Control directive at the edge or with explicit Service Worker logic is exactly the comparison in SWR Cache-Control vs Service Worker revalidation. The one caveat: if the refreshed payload differs in rendered size, reserve DOM space and use CSS containment so the background swap does not cause a layout shift and push CLS over 0.1.
Monitoring, Alerting, and CI Gates
A caching architecture decays silently. A query string sneaks into an asset URL, a Vary header gets widened, a build stops hashing a chunk — and the hit ratio quietly erodes while nothing errors. Monitoring is what makes that visible before it becomes a TTFB regression in the field.
The runtime signal is a RUM beacon that ships cache status alongside the timing metrics. From PerformanceResourceTiming you can classify each resource as a browser-cache hit, and you can correlate document TTFB and repeat-visit LCP against the edge's cache-status header. Aggregate by PoP and device tier so a single failing region is distinguishable from a platform-wide problem, and alert when the document or static-asset miss rate crosses a threshold (a miss rate above ~15% on assets you expect to be immutable is a header bug, not traffic).
// RUM: classify each resource as cache hit/miss and beacon with the document TTFB.
const nav = performance.getEntriesByType('navigation')[0];
new PerformanceObserver((list) => {
const sample = list.getEntries().map((e) => ({
url: e.name,
// transferSize 0 + body present => served from browser cache
browserHit: e.transferSize === 0 && e.decodedBodySize > 0,
dur: Math.round(e.duration),
}));
// trade-off: sendBeacon guarantees delivery on unload but gives no response, so
// you can't retry a dropped beacon — don't use it for data you must not lose.
navigator.sendBeacon('/rum/cache', JSON.stringify({ ttfb: Math.round(nav.responseStart), sample }));
}).observe({ type: 'resource', buffered: true });
The pre-production gate is CI. Lighthouse CI can assert TTFB and LCP budgets so a regression blocks the merge, and a synthetic check after deploy can assert that hashed assets return immutable and the document returns a validator. The combination — RUM for the field, Lighthouse CI for the gate, synthetic header checks for correctness — is the same budgeting discipline detailed in the best Lighthouse CI setup for frontend pipelines.
Reference Implementations
Vite build: deterministic content hashing
// vite.config.ts — every emitted file is content-addressable.
import { defineConfig } from 'vite';
export default defineConfig({
build: {
rollupOptions: {
output: {
entryFileNames: 'assets/[name]-[hash].js',
chunkFileNames: 'assets/[name]-[hash].js',
assetFileNames: 'assets/[name]-[hash][extname]',
},
},
// trade-off: sourcemap:false shrinks the deploy, but you lose readable prod
// stack traces — set 'hidden' if you upload maps to an error tracker instead.
sourcemap: false,
},
});
- Config: stable
[name], content[hash]; only changed files get new URLs. - Outcome: safe
immutableheaders and surgical invalidation (HTML only).
Tag-based invalidation on entity change
// Attach surrogate keys, then purge by tag — invalidates only what referenced the entity.
res.setHeader('Surrogate-Key', `product-${id} catalog`);
res.setHeader('Cache-Control', 'public, s-maxage=600, stale-while-revalidate=60');
// trade-off: tag purging is near-instant but needs disciplined key hygiene; a
// missing tag means stale content survives — prefer over-tagging to under-tagging.
await fetch(`https://api.cdn.example/purge`, {
method: 'POST',
headers: { 'Surrogate-Key': `product-${id}` },
});
- Config: one tag per logical entity the response depends on.
- Outcome: content edits propagate globally without an over-purge stampede.
Edge SWR with graceful origin degradation
# Document response: fresh briefly, stale-served while refreshing, stale-served on origin 5xx.
add_header Cache-Control "public, max-age=0, s-maxage=120, stale-while-revalidate=300, stale-if-error=86400";
# trade-off: stale-if-error=86400 keeps the site up during an outage but can serve
# day-old content — shorten it for pages where staleness is worse than a 503.
- Config: short
s-maxage, generousstale-while-revalidateandstale-if-error. - Outcome: flat TTFB during updates and survivable origin outages.
Service Worker cache versioning and cleanup
const CACHE = 'shell-v4';
self.addEventListener('activate', (event) => {
// trade-off: deleting old caches on activate frees quota but a too-aggressive
// match can wipe a cache the new SW still needs — gate deletion on a name prefix.
event.waitUntil(
caches.keys().then((keys) =>
Promise.all(keys.filter((k) => k.startsWith('shell-') && k !== CACHE).map((k) => caches.delete(k)))
)
);
});
- Config: monotonic version in the cache name; prefix-scoped cleanup.
- Outcome: no stale app shell pinned across deploys; bounded storage.
Common Pitfalls
- Uniform
max-ageacross all asset classes. A single TTL either serves stale HTML or wastes the long cache life that hashed assets could have. Split by content-addressability. - Widening the cache key with
Vary: User-Agentor rawCookie. Both fragment the key into near-uniqueness and collapse the edge hit ratio. KeepVarytoAccept-Encodingplus narrow, normalized dimensions. - Cache-busting query strings on hashed assets. Appending
?v=123to an already-hashed filename defeats the immutable cache and forces revalidation on every load. - Cache-first Service Worker on non-hashed URLs. Users get pinned to stale code with no recovery path short of a manual purge. Reserve cache-first for content-addressable names only.
- Directory-wide CDN purges instead of tag or URL purges. A blanket purge triggers a global MISS storm that stampedes origin precisely when traffic is live. Purge by surrogate key.
- No cache versioning or
activatecleanup in the SW. Old caches accumulate untilQuotaExceededError, and a stale shell can survive deploys indefinitely. - Ignoring INP during hydration on a cache hit. A fast cached document still blocks if hydration runs a long task; pair caching with code splitting so the main thread stays under the 50ms task budget.
- Background revalidation that shifts layout. An SWR swap whose payload differs in size causes CLS unless DOM space is reserved with explicit dimensions and containment.
FAQ
How do I split responsibility between the CDN edge and a Service Worker?
Treat the edge as the authoritative shared cache for immutable assets and the document, and the Service Worker as the client-side runtime fallback and repeat-visit accelerator. Set immutable on hashed assets at the edge, use SW cache-first for those same assets so repeat visits skip the network entirely, and use SW network-first for navigations so users still get a page during connection failures. The two layers reinforce each other; they only diverge if your SW caches non-hashed URLs without a version story.
What is the safest cache-invalidation workflow for zero-downtime deploys?
Make every static asset content-addressable so old and new versions coexist in all caches with no conflict. On deploy, the only thing you invalidate is the mutable HTML entry point that references the new hashes — a surgical purge, not a directory purge. For non-hashed content (HTML, API responses), attach surrogate keys and purge by tag when the underlying entity changes. This eliminates full-cache purges and the MISS storms they cause.
Why is TTFB still high when my CDN reports a high cache hit ratio?
Read the ratio per asset class. A high aggregate hit ratio often hides a low hit rate on the HTML document, which is the response that gates TTFB for every first paint. Check the document's edge cache-status header specifically; if it is consistently MISS or EXPIRED, the cause is usually a missing s-maxage, a Vary/cookie key that fragments per request, or a validator that always revalidates against a slow origin.
Does stale-while-revalidate hurt Core Web Vitals?
It helps LCP and INP by serving cached bytes immediately and refreshing in the background, so the user never waits on revalidation. The only risk is CLS: if the refreshed payload renders at a different size than the cached one, the background swap can shift layout. Reserve space with explicit dimensions and CSS containment, and the swap is invisible.
Why is INP degrading despite excellent cache hit ratios?
Cache hits optimize network delivery, not main-thread work. INP regressions almost always come from long tasks during hydration, Service Worker activation, or heavy JSON parsing — none of which a cache fixes. Audit tasks over 50ms, defer non-critical scripts, and keep the SW fetch handler's pre-response logic trivial so it never adds latency to a request it is meant to accelerate.
Related
- HTTP cache-control headers explained — directive-by-directive freshness model for documents and immutable assets.
- CDN edge caching configuration — keys,
s-maxage, and PoP-level hit-ratio tuning. - Service Worker caching strategies — choosing cache-first, network-first, and SWR per request destination.
- Cache invalidation patterns — tag/surrogate-key purging and safe hashed-asset invalidation.
- Measuring LCP with Chrome DevTools — confirming repeat-visit LCP wins in the timeline.
- JavaScript bundle optimization and code splitting — keeping post-cache hydration under the long-task budget.
- Image and media optimization — the largest cacheable payloads on most pages.