Make the desired path explicit.
Give every agent the same repository context, architecture rules, workflows, and acceptance standards before it starts generating code.
Harness engineering for AI-assisted teams
A model alone does not know your architecture, conventions, or definition of done. Our open-source harnesses surround it with stack-specific context, disciplined workflows, independent review, tests, and hard gates, so teams can move faster without letting the codebase drift.
What harness engineering means
A coding-agent harness makes the team's implicit standards explicit: architecture boundaries, coding conventions, test strategy, security constraints, and the definition of done. It guides work before code exists, senses problems after generation, and improves as recurring failures become new controls.
This follows the guide, sensor, and steering-loop framing described in Birgitta Böckeler's harness engineering article on MartinFowler.com .
Give every agent the same repository context, architecture rules, workflows, and acceptance standards before it starts generating code.
Use tests, linters, type checks, structural analysis, and independent AI review to find violations before human review becomes the first feedback loop.
When the same issue appears twice, improve the guide, sensor, or gate. The harness becomes organizational memory instead of another static rules file.
The harness family
The safety model, workflow discipline, review pattern, distribution CLI, and measurement loop stay consistent. The stack knowledge changes.
Shared core
priority-ordered profile · approval gates · spec and TDD workflows · independent reviewers · deterministic controls · multi-agent-tool fan-out · versioned updates · eval baselines
@tierone/llm-harness-fullstack
For monorepos combining a NestJS API, React web app, and shared contracts. This is the most documented edition and the source of the evidence shown below.
View fullstack source@tierone/llm-harness-react
For frontend repositories that need React architecture, state, routing, forms, accessibility, performance, Vitest, and Playwright guidance without backend context.
View React source@tierone/llm-harness-nest
For backend repositories that need clean architecture, authorization, persistence, transactions, Node.js operations, and API verification without frontend guidance.
View NestJS sourceNew editions can add a stack's conventions, skills, review rubrics, and eval cases while keeping the same control model.
Choose by repository shape, not by agent vendor. Every edition can target the same supported agent tools through ruler.
The system design
Every edition combines feedforward guidance, computational and inferential feedback, and deterministic release controls. The exact skills and review rubrics change with the stack; the architecture does not.
A compact, priority-ordered operating profile routes the agent into the right depth only when the work needs it.
Seven one-shot agents inspect the artifact, not the implementer's confidence. Each owns one concern and returns a binding verdict.
CI, pre-commit hooks, and agent permission rules hold the line when a model misses an instruction.
Engineering outcomes
The model remains probabilistic. The harness narrows the acceptable solution space, catches drift earlier, and makes quality visible before code ships.
Shared context, workflows, and acceptance criteria reduce output variance. Human review starts with a smaller, more consistent, and more reviewable diff.
Clean Architecture boundaries, dependency rules, cohesion, and separation of concerns are taught before implementation and checked after it.
Explicit naming, small changes, tests, DRY, SOLID, and KISS become review criteria. Reviewers reject duplication, accidental complexity, and brute-force fixes.
Type checks, tests, linters, structural assertions, fresh-context review, and eval baselines reveal whether the system still meets its engineering bar.
The same source can target Claude Code, Copilot, Codex, Cursor, and Windsurf, so changing agent tools does not mean rebuilding the engineering system.
Measured, not believed
These committed results come from @tierone/llm-harness-fullstack. Each focused edition uses the same eval machinery and publishes its own baselines as it matures.
Sonnet-class baseline for loading the right skills from real and paraphrased prompts.
Twenty-one cases, repeated three times, under the full operating profile.
Seeded regressions caught, proving the eval suite notices deleted or weakened gates.
Adherence at roughly 90k filler tokens. The reason deterministic gates are part of the design.
The weak number matters. Instruction-following degrades under context pressure. The harness does not hide that failure mode; it designs around it with CI, permission denies, and approval gates that do not depend on model memory.
Adopt without faith
Do not roll this across the organization because the architecture sounds right. Pilot it where the result can be observed.
Pick an active repository, choose the edition that matches its shape, capture the baseline, instrument the pilot, and make a scale-or-stop decision from the delta.
Read the full adoption playbookOne active repo, three to five engineers, and a baseline window captured before the harness changes anything.
Fill in repo conventions, copy the deterministic gates, and generate each agent tool's native config.
Measure cycle time, review rounds, caught findings, escaped defects, gate events, and engineer sentiment.
The decision is based on observed tradeoffs. A failed pilot costs about two developer-days, not an organization-wide migration.
The obvious objections
Code is cheap. Operators are made.
Engineering is what humans do. Code is what tools produce.