The Playbook

The industry converged.
Here's the proof.

Stripe, Spotify, Uber, Meta, Coinbase, and Ramp arrived at the same production patterns independently. We tracked every engineering blog, case study, and production report. This is what they found.

9 companies tracked
79 articles analyzed
7 convergent patterns

01 Readiness Assessment

How ready is your team for agentic delivery?

7 questions. 2 minutes. A clear answer. Score your team on the DORA AI Capabilities that determine whether agents amplify performance — or amplify dysfunction.

AI Stance

Does your org have a clear AI coding strategy — not just 'use Copilot'?

02 The 8 Harness Patterns

Non-negotiable regardless of stack.

These patterns repeat across every production agent deployment. Stripe, Coinbase, Ramp, Anthropic, OpenAI converged on them independently.

01

Pattern 01

Progressive Disclosure

Map first, depth on demand.

CLAUDE.md is a ~100-line map pointing to a docs/ directory. Do NOT give agents everything upfront. OpenAI tried a monolithic AGENTS.md — it failed in 4 documented ways: context crowding, non-guidance, instant rot, impossible to verify. mini-SWE-agent proves simplicity wins. Minimum orientation first, then depth.

Sources: Anthropic Claude Code, Open SWE, mini-SWE-agent

Used by: Stripe, Spotify, Anthropic
02

Pattern 02

Git Worktree Isolation

One agent, one worktree — always.

Parallel agents WILL conflict without filesystem-level isolation. Each agent gets its own branch, directory, and environment. Validated in isolation before merge. Stripe Minions, Open SWE, and Anthropic all converged on this independently — it is not optional.

Sources: Anthropic, Open SWE, Stripe Minions, Coinbase

Used by: Stripe, Ramp, Coinbase
03

Pattern 03

Context Pre-hydration

The agent should never search for context it needs.

Before an agent run starts, the orchestrator pulls ALL relevant context: Jira/Linear tickets, linked docs, code search results, Slack thread context, PR history. Stripe Minions: orchestrator scans the invocation thread for links and pre-fetches everything. Ramp: Linear tickets as structured context source.

Sources: Stripe Minions, Ramp Inspect

Used by: Stripe, Spotify, Meta
04

Pattern 04

Deterministic + Agentic Nodes

Guardrails where it matters.

A state machine alternating between deterministic nodes (git clone, lint, format, test — unit-tested, reliable) and agentic subtask nodes (LLM-driven code generation, refactoring). Deterministic nodes provide guardrails; agentic nodes provide flexibility. Coinbase: separate data nodes from LLM nodes.

Sources: Stripe Minions, Coinbase Cloudbot, LangChain

Used by: Coinbase, Ramp, Anthropic
05

Pattern 05

Spec First

If agents can't read it, it doesn't exist.

Agents are blind to Slack, Docs, and knowledge in people's heads. Specs, requirements, and constraints must be machine-readable files in the repo. Feature lists as JSON (not Markdown). Documentation is for agents, not just humans. DORA 2025 confirms: AI-Accessible Internal Data is a top-7 capability for AI success.

Sources: DORA 2025, Anthropic, Open SWE

Used by: Stripe, Uber, GitHub
06

Pattern 06

Mechanical Architecture Enforcement

Linters replace human review at scale.

Custom linters + structural tests + CI replace human review. Enforce invariants (dependency directions, boundaries, data validation), not implementations. Linter errors include remediation instructions formatted for agent context injection. At agent throughput, corrections are cheap and waiting is expensive.

Sources: Stripe Minions, Open SWE, Netflix Paved Roads

Used by: Uber, Meta, Stripe
07

Pattern 07

Integrated Feedback Loops

Quality bounded by feedback quality.

Close the loop as tightly as possible. Linter fires at edit time, not after CI. Puppeteer MCP for real browser verification — Anthropic found agents mark features 'complete' that don't work in the browser. Full observability exposed TO agents, not just humans. The tighter the loop, the higher the autonomous PR acceptance rate.

Sources: Anthropic, Ramp Inspect, Open SWE

Used by: Spotify, Ramp, Meta, Coinbase
08

Pattern 08

Agent Governance

Control who does what, with what permissions.

Which agents can spawn other agents, with what permissions, and with what audit trail. Spawning restrictions prevent agent explosions. Hook-based policy enforcement (exit code 2) blocks banned actions and injects corrections. Scoped credentials ensure a code-writing agent can't deploy to production. Every commit traces back to its agent session log.

Sources: AWS Frontier Agents, GitHub Actions 2026, OpenAI Agent Monitoring, Snyk Agent Security

Used by: AWS, GitHub, OpenAI, Snyk

03 The Landscape

Who's doing what. And where.

Six companies. Six SDLC phases. One clear convergence pattern. Green means production-deployed. Yellow means active or partial. The gaps are where SPEQD installs next.

Company Code GenerationCode ReviewTestingSecurityDeploy & CI/CDIncident Response
Stripe
Minions
Agent Council
Selective CI
Compliance
Auto-merge
Spotify
Honk
LLM Judge
Feedback Loops
Uber
uSpec
uReview
Meta
REA
Diff Risk
JiT Tests
Mutation
Coinbase
Cloudbot
Agent Council
Compliance
Risk-merge
Ramp
Inspect
Closed-loop
Self-verify
Production Active / Partial Not disclosed

Data compiled from 79 engineering blogs, case studies, and production reports. Last updated March 2026.

04 ROI Calculator

What's harness engineering worth to your team?

Plug in your numbers. See what the data says. Projections based on production benchmarks from Stripe, Uber, and Ramp — not theory.

Team size50
Avg hours per PR cycle12h
Deploys per week10
48,000Dev-hours saved / year
24Engineers worth of capacity
18%Projected merge rate lift
4 weeksTime to first agent PR with SPEQD

Based on production data from Stripe (1,000+ PRs/week), Uber (39 dev-years saved/year), and Ramp (50%+ agent PRs).

05 Stack Selector

Your constraints. Your stack. Our opinion.

4 questions to a personalized agent infrastructure recommendation. Based on what we've seen work across regulated fintech, high-scale consumer, and enterprise SaaS.

Do you need multi-client or multi-tenant isolation?

This determines your AI gateway and security posture.

06 Agent Ecosystem

The model is irrelevant. The harness is everything.

Princeton/Stanford's research demonstrated a 64% improvement from environment design alone — same model, same task, same compute. The only variable was the environment.

Fork

Proven base exists close to your needs.

Stripe forked Block's Goose.

Highest control, highest maintenance.

Compose

Good open-source base, need to move fast.

Ramp composed on OpenCode.

Lower maintenance, some upstream coupling.

Build

Unique constraints: security, compliance, integration.

Coinbase built Cloudbot custom.

Highest cost — only when Fork/Compose can't meet requirements.

Agent Framework

Open SWE (LangChain) LangGraph-based. Slack/Linear/GitHub integrations out of the box. Captures Stripe/Ramp/Coinbase convergent patterns. MIT licensed.
Tier 1
mini-SWE-agent Princeton/Stanford. >74% SWE-bench. ~100 lines Python. Used by Meta, NVIDIA, IBM. Best when simplicity is paramount.
Tier 1
Custom Harness Build only the harness layer on top of the base agent. For unique constraints: security, compliance, deep integration. Coinbase Cloudbot is the reference.
Tier 1

Agent Observability

LangSmith Enterprise Managed LLM tracing and evaluation. Coinbase adopted company-wide. Every tool call, retrieval, and decision is traced.
Tier 1
Langfuse Self-hosted, open source. Best when client data sensitivity requires on-prem. Full control, no data leaves the environment.
Tier 1
Grafana Stack LogQL/PromQL/TraceQL for agent observability. Best when client already has Grafana/Prometheus infrastructure.
Tier 2

AI Gateway

LiteLLM Proxy PII stripping, per-client routing, audit logging, spend tracking. Non-negotiable for multi-client engagements.
Tier 1

Context Engineering

CLAUDE.md / context.md Progressive disclosure: ~100-line map pointing to docs/ directory. Per-engagement context.md with arch, domain vocab, forbidden patterns.
Tier 1

Browser Verification

Puppeteer MCP Mandatory. Anthropic found agents consistently mark features 'complete' that don't work. Agents must verify end-to-end in real browsers.
Tier 1

Adoption

Slack Invocation Surface Ramp pattern: Slack as agent invocation surface. Results visible in shared channels. Track 'humans prompting' as a metric.
Tier 2

This is what SPEQD codifies for your team.

Get Your Delivery Blueprint