Use Cases

Who uses InsightTestBench

Every role in the QA loop, one bench.

Manually maintained test suites need a QA engineer who writes Playwright + a developer who fixes the suite every time the app changes + an on-call who guesses whether a failure is the test or the app. The bench shoulders most of it.

QA manager

Own the suite without writing code

Most QA managers know the app inside-out but don't want to live inside a Playwright codebase. The bench is the surface they actually want: paste a brief, review the auto-generated feature tree, chat-edit cases, run on demand, review failures with RCA. No code, no CI YAML, no waiting for a dev to land your test PR.

Day-to-day workflows

Bootstrap a project from a brief — get a feature tree + ~100 cases in 15 min
Chat with the agent: "add a negative case for the password-empty path on login"
Run a single feature after a dev pushes a fix
Compare last night's nightly to the previous one — what regressed?
Hit "Vision regen" on a feature when the UI changes

Developer

Get an actionable verdict on every failure

When a regression run goes red, the dev wants to know: is this my problem or the test's problem? The bench's RCA agent reads the case spec, the per-step execution log, the page observations, and the error — and tells you. Plus a suggested fix and a confidence score, so you can prioritize.

What the dev sees

Failure classified: test_design, product_bug, or environmental
Suggested fix: "bump wait_ms to 4000 — KPI cards take longer to render"
Side-by-side screenshot diff vs the last passing run
The exact assertion that failed + actual vs expected
Console errors captured during the case

CI integration: trigger runs from your pipeline via the REST API; the bench returns SSE events you can pipe into your build log.

On-call / SRE

Get a Slack ping only when it matters

The bench schedules regression runs on the cadence you pick — every hour, daily at 02:00, whatever. Webhook fires on every run, only on failures, OR only on regressions (case that was passing yesterday is failing now). With the RCA agent's verdict already attached. So your on-call only gets paged when something actually broke.

Out-of-the-box webhook payload

summary_md — markdown-rendered run digest for Slack
regressions[] — each with name, error, RCA cause + suggested fix
fixes_count — cases that flipped from failing → passing
baseline run id — for one-click comparison in the UI
Slack-formatted, generic JSON — drop into any incoming-webhook

IT / DevOps

Run it in your environment

Platform engineering, IT, security. Install the bench via Helm (k8s) or docker-compose (single VM). Wire up Okta, your S3, your Postgres, your model endpoints. Add worker boxes as you scale. Standard ops tooling all the way down — no proprietary control plane to learn.

Operational footprint

Single docker-compose stack: bench API + UI + MySQL + worker
Persistent volumes for projects, screenshots, sample data, run history
JWT auth out of the box; OAuth / SSO is a future integration point
Per-env config keyed: different URLs, login flows, credentials per stage
Bench stores env-var names only; secrets live in your environment
Reverse-proxy behind Caddy / nginx / Traefik when exposing publicly

What kind of apps get tested?

Anything with a browser-reachable UI. A few patterns the bench handles particularly well:

Internal portals

Admin / ops dashboards

Multi-page apps with sidebars, tabbed workbenches, and complex filtering. The crawl traverses menu trees and the bootstrap builds a feature plan per major surface.

Customer SaaS

Signup + onboarding flows

Form-heavy flows where every field has validation rules and every step has a happy path + several error paths. Profile sweeps generate the negative cases automatically.

E-commerce

Cart + checkout

Multi-step funnels with state that persists across pages. Session-mode test execution keeps the login + cart state across cases the way a real user would.

Multi-environment

Promote staging → production

Same suite, different env. Per-env URLs, login URLs, and persona credentials. Compare a staging run to the last green production run before shipping.

Responsive design

Desktop / tablet / phone

Bootstrap auto-generates a Responsive plan that replays the same UX cases at each viewport breakpoint. Captures screenshots at every viewport for diff review.

UX audit

Per-page critique

Bootstrap also produces a UX-audit plan. Each case captures the page at every viewport and the executor's summary step renders a per-page critique: layout, clarity, accessibility, info density.

All bench primitives are open §15 bundles — see insightworker-app-samples for the source.