Best Lighthouse CI Setup for Frontend Pipelines

This is the implementation companion to Understanding Core Web Vitals Thresholds and the broader Core Web Vitals & Measurement practice area: it takes the LCP < 2.5s, INP < 200ms, and CLS < 0.1 boundaries and turns them into a Lighthouse CI gate that actually blocks regressions instead of quietly reporting them.

Most pipelines run Lighthouse as a passive scorer — it produces a number, the number drifts, nobody notices until CrUX flags the origin a quarter later. The fix is to make the boundary a hard exit code: deterministic throttling so runs are comparable, exact maxNumericValue assertions on the metrics that matter, and a CI step whose failure blocks the merge.

Lighthouse CI gate flow Build, serve, collect three runs, assert against budgets, then block or pass the merge. From build to merge gate Build + serve static or SSR Collect x3 simulate throttle Assert budgets median of runs Block or pass exit code gate LCP ≤ 2500ms · TBT ≤ 200ms (INP proxy) · CLS ≤ 0.1 Three runs, median score — one run guarantees flake.

Rapid Diagnosis: Is Your Current Gate Actually Gating?

Before rebuilding, confirm which failure mode you have. Run this checklist against your existing config:

  • Does the CI step fail the build on a regression? If lhci assert is missing or runs with continue-on-error, the exit code is swallowed and nothing is gated.
  • Is numberOfRuns set to 1? A single run cannot absorb GC spikes or background-task noise; you will get false failures and false passes in equal measure.
  • Is throttlingMethod 'devtools' or unset? DevTools throttling rides the host CPU, so scores swing with whatever the runner is doing.
  • Are assertions using shorthand keys like lcp or cls? Lighthouse silently ignores keys that are not full audit IDs, so the gate passes everything.
  • Is the runner under-provisioned? Under ~2 vCPU / 4GB, host CPU starvation inflates timing metrics regardless of code quality.

Any "yes" above means the gate is decorative. The sections below resolve each in order of impact.

Root Cause Analysis: Why Default Lighthouse CI Fails in CI

Default Lighthouse assumes stable hardware and consistent network — CI runners violate both. Four named failure modes dominate:

  • Single-run variance. One execution cannot average out OS scheduling and garbage-collection spikes, so the metric you assert on is whichever value the runner happened to produce.
  • DevTools throttling mismatch. throttlingMethod: 'devtools' throttles relative to the host CPU; ephemeral runners with different neighbors produce different "throttled" results.
  • Missing assertion budgets. Without explicit maxNumericValue thresholds, Lighthouse falls back to composite scoring, which masks an incremental LCP creep behind a still-green overall number.
  • Silent key typos. Shorthand metric aliases are not validated; an assertion on lcp is dropped without warning, so the budget never runs.

Align the gate with the Core Web Vitals threshold boundaries rather than chasing a 100/100 score that third-party variance makes mathematically unreachable.

Step-by-Step Resolution

The fixes below are ordered by impact: throttling determinism first (it removes the largest noise source), then exact assertions, then pipeline wiring.

1. Force deterministic throttling and provision the runner

Switch throttlingMethod to 'simulate'. Simulation models the network and CPU mathematically from a single trace, so the score no longer depends on the runner's live conditions. Give the runner at least 2 vCPUs and 4GB RAM, and pass the mandatory headless flags.

bash
# Reproduce the CI profile locally to confirm determinism before wiring it in.
npx lighthouse http://localhost:3000/ \
  --preset=desktop \
  --throttling-method=simulate \
  --output=json --output-path=./ci-trace.json \
  --chrome-flags='--headless --no-sandbox --disable-gpu --disable-dev-shm-usage'
# trade-off: 'simulate' is reproducible but models a fixed device — it will
# not catch GC or thermal stalls a real low-end phone hits. Don't use it as
# your only signal for a known field regression; pair it with CrUX p75.

Expected outcome: run-to-run score spread collapses from double digits to ~1–2 points, eliminating most false PR failures.

2. Assert exact metric budgets with full audit keys

Assertions are the gate. Use maxNumericValue for timing metrics and minScore for categories, and always use full Lighthouse audit IDs — largest-contentful-paint, cumulative-layout-shift, total-blocking-time — never shorthand.

json
{
  "ci": {
    "collect": {
      "numberOfRuns": 3,
      "settings": { "preset": "desktop", "throttlingMethod": "simulate" },
      "url": ["https://preview.example.com/"]
    },
    "assert": {
      "assertions": {
        "categories:performance": ["error", { "minScore": 0.9 }],
        "largest-contentful-paint": ["error", { "maxNumericValue": 2500 }],
        "total-blocking-time": ["error", { "maxNumericValue": 200 }],
        "cumulative-layout-shift": ["error", { "maxNumericValue": 0.1 }]
      }
    },
    "upload": { "target": "temporary-public-storage" }
  }
}

Trade-off: total-blocking-time is the lab stand-in for INP — Lighthouse cannot measure INP without real interactions. Gate TBT at 200ms here, but confirm the field INP p75 separately in CrUX; a green TBT does not prove a passing INP.

Expected outcome: an LCP that drifts from 2.3s to 2.6s now fails the build explicitly instead of hiding behind a composite score.

3. Wire the assertion into the pipeline with a hard exit code

@lhci/cli exits with code 1 when an error-level assertion fails. Run it after the build so that failure maps directly to a blocked PR status check.

yaml
name: Lighthouse CI
on: [pull_request]
jobs:
  lighthouse:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - name: Build and serve
        run: npm run build && npx serve -s build -l 3000 &
      - name: Run Lighthouse CI
        run: |
          npm install -g @lhci/cli
          lhci autorun --config=./lighthouserc.json
          # trade-off: a single inline serve is fine for a static build; for
          # SSR use startServerCommand so collection waits for readiness —
          # racing `serve &` against collection yields empty first runs.

Expected outcome: a regression converts to a failing required check, so the author fixes it before merge rather than after deploy.

Verification

Confirm the gate works by regressing on purpose and watching it fail, then check the field signal afterward.

bash
# Before/after: capture the LCP numeric value across runs from the report.
lhci autorun --config=./lighthouserc.json | grep -i "largest-contentful-paint"
# Expect: exit 1 and a printed assertion failure once LCP > 2500ms.
# trade-off: grepping CLI output is quick but brittle across @lhci versions;
# for durable history, read the JSON in .lighthouseci/ or the upload dashboard.

Three checks close the loop: the before/after diff of the asserted metric in the run output, the CI exit code flipping to 1 on a planted regression, and the field check in CrUX — the lab gate is only trustworthy if its boundaries track the real-user p75 you defend in Understanding Core Web Vitals Thresholds.

Scaling and Maintenance

For monorepos, inject preview URLs from CI environment variables instead of hardcoding them, and tier severity by route: error with strict budgets on critical paths (homepage, checkout), warn with relaxed thresholds on secondary routes. Cache the @lhci/cli binary across workflows to cut overhead, and tighten budgets incrementally — start a new threshold at warn, watch the dashboard trend for two sprints, then promote it to error and reduce maxNumericValue by 5–10% per release. When the gate flags bundle bloat as the root cause of an LCP or TBT regression, drive the fix from webpack bundle analysis techniques to find the offending chunk before adjusting the budget.