Debugging Service Worker Cache Misses in Production

This is a focused troubleshooting guide within service worker caching strategies, part of the wider work on advanced caching and CDN architecture.

Service worker cache misses in production manifest as sudden latency spikes, increased origin load, and degraded Core Web Vitals. Unlike local development, production introduces complex variables: CDN edge routing, dynamic URL parameters, and strict scope boundaries. A healthy worker holds a cache hit rate above 85%; a miss rate over 15% warrants investigation, and over 20% signals a path-normalization or scope failure. The workflow below moves from rapid diagnosis to root-cause analysis to ranked, paste-ready fixes, then verification.

Cache miss triage Mapping a production cache miss symptom to four named root causes and the fix for each. Triaging a production cache miss Miss rate > 15% URL drift normalize keys Scope widen scope Vary header strip Vary Eviction version names First-install misses are expected — filter them by controller presence. Fix the dominant cause first; do not normalize and rescope at once.

Rapid Diagnosis: DevTools Checklist

Run this sequence before touching code so you fix the right cause.

  1. Application > Cache Storage. Verify the expected cache names exist. Missing keys mean premature eviction or a failed put().
  2. Network tab, Size column. (service worker) = served from cache; (disk cache) = bypassed the worker into the HTTP cache; (network) = a miss or explicit bypass.
  3. Initiator column. Apply the sw filter. Rows showing fetch instead of sw mean the worker failed to register or activate.
  4. chrome://serviceworker-internals/. Confirm Registration status and the Scope path.
  5. Console. Watch for DOMException: QuotaExceededError during cache population and for unhandled rejections in event.respondWith().
  6. Controller check. Confirm navigator.serviceWorker.controller is non-null; if it is null, the page is uncontrolled and every request is a miss by definition.

Instrument the same signal in the field so alerting is data-driven rather than anecdotal.

javascript
const observer = new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    // transferSize === 0 with encodedBodySize > 0 means served from cache
    const isCacheHit = entry.transferSize === 0 && entry.encodedBodySize > 0;
    if (navigator.serviceWorker.controller && !isCacheHit) {
      navigator.sendBeacon('/rum/miss', JSON.stringify({ url: entry.name }));
    }
  }
});
observer.observe({ type: 'resource', buffered: true });
// trade-off: only fires once a controller exists, so it under-counts the very
// first visit. That is intentional — counting first-install misses would
// drown the real regression signal. Do not remove the controller guard.

Alert on miss rate > 15% for 5 consecutive minutes, escalate at > 25% for 10 minutes. Compute the rate as (network-initiator requests / total controlled requests) * 100 so it reflects only traffic the worker actually saw a chance to serve. The five-minute window matters: shorter windows fire on the normal post-deploy dip as users pick up the new worker, generating noise that trains the team to ignore the alert. The goal of this checklist is not just to confirm there is a problem but to assign it to one of the four root causes below before you write a line of remediation — a miss that shows (network) with a clean 200 is a routing problem, while one that shows a red net::ERR_FAILED is a network or CORS problem masquerading as a cache miss and needs an entirely different fix.

Root Cause Analysis

1. URL normalization drift

caches.match() is exact-match. Browsers treat https://example.com/app and https://example.com/app/ as distinct keys, and query parameters like ?utm_source=google or ?v=1.2.3 mint a fresh entry per variant. Tracking IDs appended at runtime therefore bypass the cache entirely — the single most common production miss.

2. Scope mismatch

A worker registered at /app/sw.js cannot intercept requests outside /app/. Requests to assets above that path never reach the fetch handler, so they always hit the network. Widening scope requires the server to send the Service-Worker-Allowed header.

3. Vary header fragmentation

CDNs append Vary: Accept-Encoding or Vary: User-Agent. When the response varies, the browser can treat each variant as a separate cache entry, producing misses that look random. This is an edge-versus-worker alignment issue as much as a worker bug.

4. Eviction from missing versioning

Clearing all cache storage on every update — instead of rotating versioned names — guarantees a cold cache after each deploy. Combined with an over-50MB footprint, the browser may also evict under quota pressure (QuotaExceededError), silently.

These four causes are not equally likely, and the order you investigate them in matters. URL normalization drift accounts for the largest share of production misses because tracking parameters are appended by code outside your control — analytics scripts, ad networks, email link decorators — so it should be your first hypothesis whenever the miss rate is high but the worker is plainly registered and in scope. Scope mismatch produces a different signature: a systematic miss for an entire path prefix rather than a scattered one, which is easy to spot in the Network tab because every asset under the uncontrolled prefix shows (network) while everything else is healthy. Vary fragmentation looks random and is the cause you reach for only after the first two are ruled out. Eviction announces itself through the console (QuotaExceededError) and through caches that exist immediately after a deploy and then vanish. Diagnosing the signature before writing a fix prevents the classic mistake of normalizing keys to solve what was actually a scope problem.

Step-by-Step Resolution

Fix 1 — Normalize the cache key (highest impact)

Strip tracking params and trailing slashes before matching. This alone often recovers most of a regressed hit rate.

javascript
self.addEventListener('fetch', (event) => {
  const url = new URL(event.request.url);
  ['utm_source', 'utm_medium', 'utm_campaign', 'fbclid'].forEach((p) =>
    url.searchParams.delete(p)
  );
  url.pathname = url.pathname.replace(/\/$/, '');

  event.respondWith(
    caches.match(url.toString()).then((cached) =>
      cached || fetch(event.request).then((res) => {
        if (res.ok) {
          const clone = res.clone();
          caches.open('v1').then((cache) => cache.put(url.toString(), clone));
        }
        return res;
      })
    )
  );
});
// trade-off: this normalizes ALL params away from the key. For endpoints where
// a query param is semantically meaningful (?page=2, ?id=42), deleting it would
// serve the wrong cached body — only strip a known tracking allow-list there.

Expected outcome: recovers tracking-param misses; on feeds with heavy UTM traffic this typically lifts the hit rate by 10–25 points and cuts origin requests proportionally.

Fix 2 — Correct the registration scope

Register at the root and let the server authorize it.

javascript
navigator.serviceWorker.register('/sw.js', { scope: '/' });
// requires response header on /sw.js:  Service-Worker-Allowed: /
// trade-off: a root scope means the worker now intercepts EVERY request,
// including third-party widgets and admin routes. If parts of the app must
// never be cached, keep a narrower scope and exclude them explicitly instead.

Expected outcome: assets previously above the worker's path are now intercepted, eliminating the systematic miss for those routes.

Fix 3 — Guard cacheability and normalize Vary

Respect no-store/private semantics and avoid fragmenting on Vary.

javascript
function isCacheable(response) {
  const cc = response.headers.get('Cache-Control') || '';
  if (cc.includes('no-store') || cc.includes('private')) return false;
  return true;
}
// trade-off: honoring no-store correctly EXCLUDES those responses from the
// cache, so endpoints that send it stay network-bound. That is correct for
// security; do not "fix" a miss by force-caching a private response.

Expected outcome: removes random Vary-driven misses while preventing accidental caching of sensitive responses.

Fix 4 — Rotate versioned cache names

Replace blanket clearing with version-based eviction in activate.

javascript
const CURRENT_CACHE = 'v2';
self.addEventListener('activate', (event) => {
  event.waitUntil(
    caches.keys().then((keys) =>
      Promise.all(keys.filter((k) => k !== CURRENT_CACHE).map((k) => caches.delete(k)))
    )
  );
});
// trade-off: this purges every non-current cache, including runtime API caches,
// so the first post-deploy visit pays a full round-trip. Whitelist long-lived
// runtime caches if that cold-start cost matters for your traffic.

Expected outcome: deploys no longer cold-start the cache; repeat-visit TTFB returns to the sub-200ms cache-served range after the first navigation.

Fix 5 — Mask residual misses with stale-while-revalidate

For read-heavy routes, return cached content instantly while refreshing in the background. This is the same pattern documented in depth under stale-while-revalidate implementation.

javascript
self.addEventListener('fetch', (event) => {
  const url = new URL(event.request.url);
  if (url.pathname.startsWith('/api/')) {
    event.respondWith(
      caches.match(event.request).then((cached) => {
        const network = fetch(event.request).then((res) => {
          if (res.ok) caches.open('v1').then((c) => c.put(event.request, res.clone()));
          return res;
        });
        return cached || network;
      })
    );
  }
});
// trade-off: SWR serves one-revision-old data on the first paint after a change.
// Never apply it to balances, cart totals, or auth — there a stale value is a
// correctness bug, not a cosmetic delay.

Expected outcome: a cache miss no longer blocks render; perceived latency drops to cache-served speed even while the background fetch repopulates the entry.

Verification

Validate immediately after deploy, before declaring the regression closed.

  • Lighthouse, Fast 3G. Confirm "Serve static assets with an efficient cache policy" passes and that (service worker) appears in the Network initiator column for all critical paths.
  • CI assertion. Fail the build if the worker audit regresses:
javascript
module.exports = {
  ci: { assert: { assertions: { 'service-worker': 'error', 'uses-long-cache-ttl': ['error', { maxLength: 0 }] } } },
};
// trade-off: uses-long-cache-ttl at error severity will flag any short-TTL
// asset, including intentionally short-lived HTML. Scope it to static asset
// audits or downgrade to warn if your shell is deliberately revalidated.
  • RUM field check. Watch the live miss-rate beacon return below 15% at the p75 over the next full deploy cycle.
  • Before/after diff. Compare origin request volume and repeat-visit TTFB; a successful fix shows origin load dropping and TTFB returning to the sub-200ms cache-served band.

If miss rates spike above 20% within 15 minutes of the change, roll back by bumping the cache version and letting activate purge stale versions. Keep the rollback to a single, reversible lever — the cache version string — so an on-call engineer can recover without redeploying application code. Document the exact version bump and the expected recovery curve (hit rate should climb back toward the 85% floor within one warm-up cycle) in a runbook, because a cache regression discovered at 2am is not the moment to reverse-engineer how your own invalidation works. For the broader routing and strategy context behind these fixes, return to service worker caching strategies.