The Playbook
The industry converged.
Here's the proof.
Stripe, Spotify, Uber, Meta, Coinbase, and Ramp arrived at the same production patterns independently. We tracked every engineering blog, case study, and production report. This is what they found.
01 Readiness Assessment
How ready is your team for agentic delivery?
7 questions. 2 minutes. A clear answer. Score your team on the DORA AI Capabilities that determine whether agents amplify performance — or amplify dysfunction.
Does your org have a clear AI coding strategy — not just 'use Copilot'?
02 The 8 Harness Patterns
Non-negotiable regardless of stack.
These patterns repeat across every production agent deployment. Stripe, Coinbase, Ramp, Anthropic, OpenAI converged on them independently.
Pattern 01
Progressive Disclosure
Map first, depth on demand.
CLAUDE.md is a ~100-line map pointing to a docs/ directory. Do NOT give agents everything upfront. OpenAI tried a monolithic AGENTS.md — it failed in 4 documented ways: context crowding, non-guidance, instant rot, impossible to verify. mini-SWE-agent proves simplicity wins. Minimum orientation first, then depth.
Sources: Anthropic Claude Code, Open SWE, mini-SWE-agent
Used by: Stripe, Spotify, AnthropicPattern 02
Git Worktree Isolation
One agent, one worktree — always.
Parallel agents WILL conflict without filesystem-level isolation. Each agent gets its own branch, directory, and environment. Validated in isolation before merge. Stripe Minions, Open SWE, and Anthropic all converged on this independently — it is not optional.
Sources: Anthropic, Open SWE, Stripe Minions, Coinbase
Used by: Stripe, Ramp, CoinbasePattern 03
Context Pre-hydration
The agent should never search for context it needs.
Before an agent run starts, the orchestrator pulls ALL relevant context: Jira/Linear tickets, linked docs, code search results, Slack thread context, PR history. Stripe Minions: orchestrator scans the invocation thread for links and pre-fetches everything. Ramp: Linear tickets as structured context source.
Sources: Stripe Minions, Ramp Inspect
Used by: Stripe, Spotify, MetaPattern 04
Deterministic + Agentic Nodes
Guardrails where it matters.
A state machine alternating between deterministic nodes (git clone, lint, format, test — unit-tested, reliable) and agentic subtask nodes (LLM-driven code generation, refactoring). Deterministic nodes provide guardrails; agentic nodes provide flexibility. Coinbase: separate data nodes from LLM nodes.
Sources: Stripe Minions, Coinbase Cloudbot, LangChain
Used by: Coinbase, Ramp, AnthropicPattern 05
Spec First
If agents can't read it, it doesn't exist.
Agents are blind to Slack, Docs, and knowledge in people's heads. Specs, requirements, and constraints must be machine-readable files in the repo. Feature lists as JSON (not Markdown). Documentation is for agents, not just humans. DORA 2025 confirms: AI-Accessible Internal Data is a top-7 capability for AI success.
Sources: DORA 2025, Anthropic, Open SWE
Used by: Stripe, Uber, GitHubPattern 06
Mechanical Architecture Enforcement
Linters replace human review at scale.
Custom linters + structural tests + CI replace human review. Enforce invariants (dependency directions, boundaries, data validation), not implementations. Linter errors include remediation instructions formatted for agent context injection. At agent throughput, corrections are cheap and waiting is expensive.
Sources: Stripe Minions, Open SWE, Netflix Paved Roads
Used by: Uber, Meta, StripePattern 07
Integrated Feedback Loops
Quality bounded by feedback quality.
Close the loop as tightly as possible. Linter fires at edit time, not after CI. Puppeteer MCP for real browser verification — Anthropic found agents mark features 'complete' that don't work in the browser. Full observability exposed TO agents, not just humans. The tighter the loop, the higher the autonomous PR acceptance rate.
Sources: Anthropic, Ramp Inspect, Open SWE
Used by: Spotify, Ramp, Meta, CoinbasePattern 08
Agent Governance
Control who does what, with what permissions.
Which agents can spawn other agents, with what permissions, and with what audit trail. Spawning restrictions prevent agent explosions. Hook-based policy enforcement (exit code 2) blocks banned actions and injects corrections. Scoped credentials ensure a code-writing agent can't deploy to production. Every commit traces back to its agent session log.
Sources: AWS Frontier Agents, GitHub Actions 2026, OpenAI Agent Monitoring, Snyk Agent Security
Used by: AWS, GitHub, OpenAI, Snyk03 The Landscape
Who's doing what. And where.
Six companies. Six SDLC phases. One clear convergence pattern. Green means production-deployed. Yellow means active or partial. The gaps are where SPEQD installs next.
| Company | Code Generation | Code Review | Testing | Security | Deploy & CI/CD | Incident Response |
|---|---|---|---|---|---|---|
| Stripe | Minions | Agent Council | Selective CI | Compliance | Auto-merge | |
| Spotify | Honk | LLM Judge | Feedback Loops | | | |
| Uber | uSpec | uReview | | | | |
| Meta | REA | Diff Risk | JiT Tests | Mutation | | |
| Coinbase | Cloudbot | Agent Council | | Compliance | Risk-merge | |
| Ramp | Inspect | Closed-loop | Self-verify | | | |
Data compiled from 79 engineering blogs, case studies, and production reports. Last updated March 2026.
04 ROI Calculator
What's harness engineering worth to your team?
Plug in your numbers. See what the data says. Projections based on production benchmarks from Stripe, Uber, and Ramp — not theory.
Based on production data from Stripe (1,000+ PRs/week), Uber (39 dev-years saved/year), and Ramp (50%+ agent PRs).
05 Stack Selector
Your constraints. Your stack. Our opinion.
4 questions to a personalized agent infrastructure recommendation. Based on what we've seen work across regulated fintech, high-scale consumer, and enterprise SaaS.
Do you need multi-client or multi-tenant isolation?
This determines your AI gateway and security posture.
06 Agent Ecosystem
The model is irrelevant. The harness is everything.
Princeton/Stanford's research demonstrated a 64% improvement from environment design alone — same model, same task, same compute. The only variable was the environment.
Fork
Proven base exists close to your needs.
Stripe forked Block's Goose.
Highest control, highest maintenance.
Compose
Good open-source base, need to move fast.
Ramp composed on OpenCode.
Lower maintenance, some upstream coupling.
Build
Unique constraints: security, compliance, integration.
Coinbase built Cloudbot custom.
Highest cost — only when Fork/Compose can't meet requirements.