Offloading Work to Web Workers with Comlink: Protecting INP by Moving Heavy Work Off the Main Thread

This guide extends the interactivity engineering in Core Web Vitals & Measurement to the case where a task is simply too expensive to run on the main thread at all, no matter how you schedule it.

There is a hard ceiling on what cooperative scheduling can buy you. Slicing a long task into chunks and yielding between them keeps a click responsive only while the total work fits the frame budget across a handful of yields. When a single operation — parsing a 4MB JSON payload, hashing a file, decoding and resizing an image, diffing two large trees, or building a search index — costs 300ms or more of pure CPU, there is no chunk boundary cheap enough to hide it. Every millisecond it runs is a millisecond the main thread cannot run an event listener or paint a frame, and Interaction to Next Paint (INP) reports the worst interaction across the visit, so one such operation overlapping a click fails the 200ms "good" boundary outright. The only structural fix is to move that CPU off the main thread entirely, onto a Web Worker, and communicate the result back.

Raw Web Workers make that move painful: you wire up postMessage, hand-roll a message-type protocol, correlate responses to requests, and serialize everything by hand. Comlink (1.1KB gzipped) wraps the worker boundary in a Proxy so a worker's exported functions look like ordinary async methods on the main thread — you await worker.parseJSON(text) and Comlink handles the message round-trip, the request/response correlation, and the structured-clone marshalling underneath. This guide walks the full loop: decide what is worth offloading, set up Comlink in Vite or webpack, manage worker lifecycle and pooling, use Transferable objects to avoid copy cost, and measure that INP actually moved in the field rather than assuming it did.

Problem Framing: When Scheduling Stops Being Enough

The decision between scheduling and offloading turns on one question: is the total work acceptable, or is its shape wrong? If a 220ms operation is acceptable in aggregate and only hurts because it runs as one unbroken block, splitting it with scheduler.yield() is the cheaper fix — no serialization cost, no worker to manage. But if the operation is genuinely expensive and runs while users interact, yielding only makes it interruptible, not cheap; the total wall-clock cost still competes with rendering across every chunk boundary. A worker removes that competition: the CPU runs on a different thread, the main thread stays at a 0ms long-task profile during the job, and input is processed within budget the entire time.

The honest trade is latency for throughput-isolation. A worker call carries fixed overhead — message dispatch, structured clone of the arguments, and the clone of the result coming back — typically 1–5ms for small payloads but scaling with payload size. For a 300ms job that overhead is noise; for a 2ms job it is pure loss. Offload work whose CPU cost dwarfs the marshalling cost, and keep cheap, latency-sensitive work on the main thread.

Prerequisites: Versions, Packages, and Build Flags

This guide assumes Comlink 4.4+, a bundler with first-class worker support (Vite 5+ or webpack 5+), and a browser baseline that supports module workers (new Worker(url, { type: 'module' })) — Chromium 80+, Firefox 114+, Safari 15+. Install Comlink:

bash

# trade-off: Comlink adds ~1.1KB gzipped and an async hop to every worker call;
# if you only ever call one fire-and-forget worker function, raw postMessage is
# leaner — see the comparison page linked below before adopting it everywhere.
npm install comlink

No build flags are required for Vite. For webpack 5, confirm output.workerChunkLoading is left at its default and that you are on a version with native new Worker(new URL(...)) parsing (5.27+). If you target browsers without module-worker support, you will need a classic-worker fallback build; that is the only configuration that meaningfully changes the setup below.

1. Environment Setup: Wiring Comlink Through Your Bundler

The worker file exposes its API with Comlink.expose(); the main thread wraps the worker with Comlink.wrap() to get the typed proxy.

javascript

// heavy.worker.js
import * as Comlink from 'comlink';

const api = {
  parseJSON(text) {
    return JSON.parse(text); // runs off the main thread
  },
  buildSearchIndex(records) {
    // expensive tokenization + inverted index construction
    return indexRecords(records);
  },
};

Comlink.expose(api);
// trade-off: every exposed method's arguments and return value are structured-cloned
// across the boundary, so exposing a method that returns a 50MB object just moves the
// copy cost rather than removing it — return Transferables or a slim summary instead.

The critical detail for both Vite and webpack is constructing the worker with the new URL(..., import.meta.url) form so the bundler can statically discover the worker file, fingerprint it, and emit it as a separate chunk:

javascript

// main.js — Vite and webpack 5 both understand this exact form
import * as Comlink from 'comlink';

// The new URL(..., import.meta.url) pattern lets the bundler resolve and hash the
// worker as its own asset. A bare string path would NOT be bundled correctly.
const worker = new Worker(new URL('./heavy.worker.js', import.meta.url), {
  type: 'module',
});
export const heavy = Comlink.wrap(worker);
// trade-off: this creates the worker eagerly at module load, paying ~5-15ms of worker
// startup during page init — lazy-create it on first use if the work is rarely needed.

With that in place the rest of the app calls await heavy.parseJSON(text) as if it were local. Pair this build configuration with your broader splitting strategy in dynamic imports and route-based splitting, so the worker chunk loads on the route that needs it rather than at first paint.

2. Capture a Baseline: Prove the Work Is the Bottleneck

Before moving anything, confirm the operation is what inflates INP. Record real-user interactions with the web-vitals attribution build, which splits each interaction into input delay, processing duration, and presentation delay, and correlate the slow ones with long-task entries.

javascript

import { onINP } from 'web-vitals/attribution';

onINP(({ value, attribution }) => {
  // processingDuration ballooning + a long-task entry over the same script
  // is the signature of a CPU-bound handler that belongs in a worker.
  navigator.sendBeacon('/rum/inp', JSON.stringify({
    value,
    processing: attribution.processingDuration,
    script: attribution.longAnimationFrameEntries?.[0]?.scripts?.[0]?.sourceURL,
  }));
  // trade-off: attribution build is ~2KB heavier than the core web-vitals build;
  // ship it to a traffic sample, not 100% of sessions, once you trust the signal.
});

In the lab, open the Performance panel under 4x CPU throttling, run the interaction, and read the Long Tasks track. A single task wider than 200ms whose flame chart is dominated by one function — JSON.parse, a hashing routine, an image decode — is an offload candidate. Record the worst interaction's INP and the width of that task; every later step is judged against this number. The deeper replay-and-rank workflow lives in profiling event handlers for INP.

3. Isolate the Bottleneck: What Is Actually Worth Offloading

Not all heavy work is worker-friendly. The candidates worth moving share two properties: they are CPU-bound (not waiting on I/O), and they take a self-contained input to a self-contained output without touching the DOM.

JSON parsing and serialization. JSON.parse of a multi-megabyte payload is the canonical case; it is synchronous and uninterruptible. Covered end-to-end in moving heavy JSON parsing off the main thread.
Cryptography and hashing. File checksums, client-side encryption, and password-derivation functions are pure CPU and trivially offloaded.
Image processing. Decoding, resizing, format conversion, and pixel manipulation via OffscreenCanvas belong in a worker; the canvas can be transferred so the worker paints without touching the DOM.
Diffing and reconciliation. Comparing two large object trees or computing a structural diff (document editors, sync engines) is CPU-bound and parallelizable.
Search-index construction. Building an inverted index or fuzzy-search structure over thousands of records is a classic load-time stall that a worker hides completely.

What does not belong in a worker: anything that reads or mutates the DOM (workers have no DOM), tiny operations where clone overhead exceeds compute, and work that is really I/O-bound (a fetch does not block the main thread, so moving it to a worker buys nothing). Run the rule of thumb from step 2: if the function dominates a long task wider than the marshalling cost of its input, offload it.

4. Apply the Fix: Transferables, Lifecycle, and Pooling

Transferable objects: move bytes instead of copying them

By default every argument and return value is structured-cloned — a deep copy across the boundary. For large binary payloads that copy can itself cost tens of milliseconds. Transferable objects (ArrayBuffer, MessagePort, ImageBitmap, OffscreenCanvas) are instead moved: ownership transfers to the other thread in near-constant time and the sender loses access. Wrap them with Comlink.transfer():

javascript

import * as Comlink from 'comlink';

async function hashFile(file) {
  const buffer = await file.arrayBuffer();
  // Comlink.transfer marks the ArrayBuffer to be MOVED, not cloned — the 40MB
  // buffer crosses in ~0ms instead of being deep-copied.
  const digest = await heavy.sha256(Comlink.transfer(buffer, [buffer]));
  return digest;
  // trade-off: after transfer the local `buffer` is neutered (byteLength 0); if the
  // main thread still needs the bytes, clone first or you'll read an empty buffer.
}

Strings and plain objects are not transferable — they are always cloned. That is the core constraint behind the JSON case: passing a 4MB string to the worker still copies the string, so the win comes from moving the JSON.parse CPU off-thread, not from avoiding the copy. When you control the wire format, prefer binary payloads you can transfer.

Lifecycle and pooling: don't spawn a worker per call

Worker startup costs 5–15ms and a fresh JS context, so creating one per call destroys the benefit. Create the worker once and reuse the proxy. For workloads that can run several jobs concurrently — independent diffs, a batch of image resizes — a small pool of workers lets jobs run in parallel across cores while bounding memory:

javascript

import * as Comlink from 'comlink';

function createWorkerPool(size = navigator.hardwareConcurrency || 4) {
  const workers = Array.from({ length: size }, () =>
    Comlink.wrap(new Worker(new URL('./heavy.worker.js', import.meta.url), { type: 'module' })),
  );
  let next = 0;
  // simple round-robin; good enough when jobs have similar cost
  return () => workers[next++ % workers.length];
  // trade-off: each worker is a full JS context (~1-3MB RAM + its own copy of the
  // bundle's worker chunk), so a pool of 8 on a low-end phone can trigger memory
  // pressure — cap the size and shut idle workers down with worker.terminate().
}

const pickWorker = createWorkerPool();
export const runDiff = (a, b) => pickWorker().diff(a, b);

For the cancellation discipline — aborting a stale worker job when the user re-interacts — apply the AbortController pattern from optimizing INP with scheduler.yield(); a worker job that nobody will read should be abandoned so it stops consuming a core.

Deconstructing the Cost: Where the Milliseconds Actually Go

A worker round-trip is not free, and pretending it is leads to offloading work that gets slower. The total latency of await heavy.fn(input) decomposes into measurable phases, each with its own budget against the interaction:

Argument clone (post) — structured-cloning input into the message. Scales with payload size; effectively 0ms for Transferables, but tens of milliseconds for a multi-megabyte plain object. Budget: keep under 5ms by transferring binary or slimming the payload.
Dispatch + queue — the message crossing the boundary and the worker dequeuing it. Roughly 1ms, fixed.
Compute — the actual CPU work, now off the main thread. This is the entire point; it no longer counts against INP at all.
Result clone (return) — cloning the worker's return value back. The most-missed cost: a worker that parses JSON and returns the whole object pays a second deep clone on the way back. Budget: return a slim summary or a Transferable, not the full structure.

The failure pattern is symmetric copies: a handler that was 300ms of CPU becomes 40ms of argument clone + 300ms off-thread compute + 60ms of result clone, and the two clones — running on the main thread — reintroduce 100ms of blocking you thought you removed. Measure the clone phases explicitly, not just the wall-clock of the call.

Advanced Diagnostics and Edge-Case Failure Modes

The result-clone trap. The single most common regression is moving compute off-thread while leaving a giant return value to clone back. If the worker's job is to reduce data (parse-then-filter, index-then-query), do the reduction inside the worker and return only what the UI needs. Returning the full parsed object often clones more than the parse saved.

Exposing callbacks across the boundary. Comlink can proxy functions you pass into the worker (Comlink.proxy(cb)) so the worker can call back for progress events. Each invocation is a full round-trip, so a progress callback fired per item floods the message queue and can block the main thread with reply handling — throttle progress to a few updates per second.

Module-worker support gaps. If you must support a browser without module workers, the new Worker(url, { type: 'module' }) form silently fails or falls back to classic-script semantics where bare imports break. Ship a classic-worker bundle behind feature detection rather than discovering the failure in the field.

Serialization of non-cloneable values. Structured clone cannot copy functions, DOM nodes, class instances with methods, or Error subclasses with custom fields. Passing one throws a DataCloneError at the boundary. Keep the worker API surface to plain data and Transferables; if you need a class back, return its data and rehydrate on the main thread.

Debugging across threads. A thrown error inside the worker surfaces on the main thread as a rejected promise, but the stack trace points into the worker bundle. Source-map the worker chunk in your build, and in DevTools select the worker's context in the Sources panel to set breakpoints — the main-thread debugger will not stop inside worker code by default.

Reference Implementations

These three patterns cover most production offloading work. Each is paste-ready and annotated with the configuration assumption, the trade-off, and the outcome you should observe.

A lazily created, single-purpose worker

Most apps need one worker created on first use, not at page load. This wrapper defers worker creation until the first call, so a feature nobody touches never pays startup cost, and memoizes the proxy thereafter.

javascript

import * as Comlink from 'comlink';

let proxy; // memoized after first use
export function getHeavy() {
  if (!proxy) {
    const worker = new Worker(new URL('./heavy.worker.js', import.meta.url), { type: 'module' });
    proxy = Comlink.wrap(worker);
  }
  return proxy;
}
// usage: const data = await getHeavy().parseJSON(text);
// trade-off: lazy creation adds ~5-15ms to the FIRST interaction that needs it; if the
// work is on a critical path the user hits immediately, create the worker eagerly during
// idle time with requestIdleCallback instead so the cost is paid before the click.

Outcome: zero worker startup cost on pages that never invoke the feature; one-time 5–15ms cost folded into the first use otherwise. This is the default shape for product code.

Transferring an OffscreenCanvas for image work

Image decode, resize, and re-encode are pure CPU and a textbook offload. OffscreenCanvas is transferable, so the worker can rasterize without ever touching the DOM and hand back an ImageBitmap or a Blob.

javascript

// main.js
const offscreen = canvasEl.transferControlToOffscreen();
// transfer ownership of the canvas to the worker; the main thread no longer draws to it
await getHeavy().renderThumbnail(Comlink.transfer(offscreen, [offscreen]), srcUrl);
// trade-off: once transferred, the main thread cannot draw to this canvas at all — only
// use transferControlToOffscreen for canvases the worker fully owns, not shared UI canvases.

Outcome: image rasterization that previously blocked the main thread for 80–250ms runs entirely off-thread; the visible long task disappears and INP for any concurrent interaction holds under budget.

A summarizing worker that returns only what the UI needs

The highest-leverage pattern is doing the reduction inside the worker so the return clone stays tiny. A worker that parses, filters, and projects returns kilobytes instead of megabytes.

javascript

// heavy.worker.js
Comlink.expose({
  topProducts(text, limit) {
    const all = JSON.parse(text);                 // heavy parse, off-thread
    return all.items
      .sort((a, b) => b.sales - a.sales)
      .slice(0, limit)                            // reduce BEFORE returning
      .map(({ id, name, sales }) => ({ id, name, sales }));
  },
});
// trade-off: if the UI later needs the full dataset (e.g. for client-side re-sorting),
// returning only the top N forces a second worker round-trip — return a slim shape only
// when the UI genuinely consumes a slim shape.

Outcome: the return-clone phase drops from tens of milliseconds (cloning the whole parsed object) to near zero, eliminating the most common offloading regression where the result clone reintroduces the stall you removed.

Common Pitfalls

Returning the full parsed object. The single most frequent regression — the worker parses off-thread but clones a multi-megabyte result back, reintroducing main-thread blocking. Reduce inside the worker.
Spawning a worker per call. Worker startup is 5–15ms and a fresh context; creating one per invocation makes the abstraction slower than staying on the main thread. Reuse or pool.
Calling .json() before the worker. Using res.json() parses on the main thread before the payload ever reaches the worker. Fetch as .text() and parse inside the worker.
Forgetting Transferables for binary data. Passing a large ArrayBuffer without Comlink.transfer() deep-copies it across the boundary; wrap it so it moves in near-constant time.
Offloading I/O-bound work. A fetch does not block the main thread, so moving it into a worker buys nothing and adds round-trip overhead. Offload CPU, not I/O.
Over-fine-grained calls. Because the Comlink proxy hides the boundary, it is easy to call it in a tight loop, paying clone + dispatch cost per iteration. Batch the work into one coarse call.
Missing worker source maps. Without them, a worker exception surfaces as an unreadable rejected promise. Emit source maps for the worker chunk in your build.
Unbounded pools on low-end devices. Each worker is a full JS context with its own copy of the worker bundle; a large pool on a budget phone triggers memory pressure. Cap pool size and terminate idle workers.

Validation and Budgeting in CI

Offloading is proven by the long-task profile and field INP after it ships, not by the architecture diagram. Assert both. In Lighthouse CI, fail the build when Total Blocking Time regresses, since TBT is the lab proxy that drops when the long task leaves the main thread:

javascript

// lighthouserc.js
module.exports = {
  ci: {
    assert: {
      assertions: {
        // TBT should fall sharply once the heavy task runs off-thread
        'total-blocking-time': ['error', { maxNumericValue: 200 }],
        'mainthread-work-breakdown': ['warn', { maxNumericValue: 2000 }],
      },
    },
  },
};
// trade-off: TBT measures main-thread blocking only, so it will look great even if you
// introduced a 200ms result-clone stall that DOES block — pair it with a scripted
// interaction check that asserts no single task exceeds 50ms during the action.

Add a Playwright or Puppeteer script that performs the interaction while recording PerformanceLongTaskTiming, and assert no entry exceeds 50ms — this catches the result-clone regression that page-load TBT can miss, and it catches a worker that was never actually created (the work silently ran on the main thread because the bundler failed to emit the chunk). Run it in the same pipeline stage as Lighthouse. Finally, confirm field INP at the p75 in your RUM dashboard before and after; treat the deploy as proven only when the field number moves, since the marshalling cost is hardware-sensitive and low-end devices show it most. The lab gate blocks the regression from merging; the field check confirms the win on real phones. The full CI harness is covered in the Lighthouse CI setup for frontend pipelines.

Optimizing INP with scheduler.yield() — the cheaper fix when work fits the budget but its shape is wrong; choose it before reaching for a worker.
Moving heavy JSON parsing off the main thread — the canonical parse-stall scenario diagnosed and fixed end to end.
Comlink vs raw postMessage for workers — the decision matrix for which worker abstraction to adopt.
Profiling event handlers for INP — locating and ranking the slow interaction before you decide what to offload.
Dynamic imports and route-based splitting — loading the worker chunk on the route that needs it instead of at first paint.