InsightTestBench Logo
  • contact@verticalserve.com

Use Cases

How real teams use InsightTestBench to keep regression coverage trustworthy through every release.

Who uses InsightTestBench

Every role in the QA loop, one bench.

Manually maintained test suites need a QA engineer who writes Playwright + a developer who fixes the suite every time the app changes + an on-call who guesses whether a failure is the test or the app. The bench shoulders most of it.

QA manager

Own the suite without writing code

Most QA managers know the app inside-out but don't want to live inside a Playwright codebase. The bench is the surface they actually want: paste a brief, review the auto-generated feature tree, chat-edit cases, run on demand, review failures with RCA. No code, no CI YAML, no waiting for a dev to land your test PR.

Day-to-day workflows
  • Bootstrap a project from a brief — get a feature tree + ~100 cases in 15 min
  • Chat with the agent: "add a negative case for the password-empty path on login"
  • Run a single feature after a dev pushes a fix
  • Compare last night's nightly to the previous one — what regressed?
  • Hit "Vision regen" on a feature when the UI changes
Developer

Get an actionable verdict on every failure

When a regression run goes red, the dev wants to know: is this my problem or the test's problem? The bench's RCA agent reads the case spec, the per-step execution log, the page observations, and the error — and tells you. Plus a suggested fix and a confidence score, so you can prioritize.

What the dev sees
  • Failure classified: test_design, product_bug, or environmental
  • Suggested fix: "bump wait_ms to 4000 — KPI cards take longer to render"
  • Side-by-side screenshot diff vs the last passing run
  • The exact assertion that failed + actual vs expected
  • Console errors captured during the case
CI integration: trigger runs from your pipeline via the REST API; the bench returns SSE events you can pipe into your build log.
On-call / SRE

Get a Slack ping only when it matters

The bench schedules regression runs on the cadence you pick — every hour, daily at 02:00, whatever. Webhook fires on every run, only on failures, OR only on regressions (case that was passing yesterday is failing now). With the RCA agent's verdict already attached. So your on-call only gets paged when something actually broke.

Out-of-the-box webhook payload
  • summary_md — markdown-rendered run digest for Slack
  • regressions[] — each with name, error, RCA cause + suggested fix
  • fixes_count — cases that flipped from failing → passing
  • baseline run id — for one-click comparison in the UI
  • Slack-formatted, generic JSON — drop into any incoming-webhook
IT / DevOps

Run it in your environment

Platform engineering, IT, security. Install the bench via Helm (k8s) or docker-compose (single VM). Wire up Okta, your S3, your Postgres, your model endpoints. Add worker boxes as you scale. Standard ops tooling all the way down — no proprietary control plane to learn.

Operational footprint
  • Single docker-compose stack: bench API + UI + MySQL + worker
  • Persistent volumes for projects, screenshots, sample data, run history
  • JWT auth out of the box; OAuth / SSO is a future integration point
  • Per-env config keyed: different URLs, login flows, credentials per stage
  • Bench stores env-var names only; secrets live in your environment
  • Reverse-proxy behind Caddy / nginx / Traefik when exposing publicly

What kind of apps get tested?

Anything with a browser-reachable UI. A few patterns the bench handles particularly well:

Internal portals
Admin / ops dashboards

Multi-page apps with sidebars, tabbed workbenches, and complex filtering. The crawl traverses menu trees and the bootstrap builds a feature plan per major surface.

Customer SaaS
Signup + onboarding flows

Form-heavy flows where every field has validation rules and every step has a happy path + several error paths. Profile sweeps generate the negative cases automatically.

E-commerce
Cart + checkout

Multi-step funnels with state that persists across pages. Session-mode test execution keeps the login + cart state across cases the way a real user would.

Multi-environment
Promote staging → production

Same suite, different env. Per-env URLs, login URLs, and persona credentials. Compare a staging run to the last green production run before shipping.

Responsive design
Desktop / tablet / phone

Bootstrap auto-generates a Responsive plan that replays the same UX cases at each viewport breakpoint. Captures screenshots at every viewport for diff review.

UX audit
Per-page critique

Bootstrap also produces a UX-audit plan. Each case captures the page at every viewport and the executor's summary step renders a per-page critique: layout, clarity, accessibility, info density.

All bench primitives are open §15 bundles — see insightworker-app-samples for the source.

Bring regression coverage back to your team — without another SaaS.

InsightTestBench runs in your environment, on your network, against your apps. Self-host in one command. Talk to us if you want help wiring it to your CI.

Self-hosted • Multi-env • Vision-grounded • Self-explaining failures • Scheduled regression watchdog