Is your codebase agent-ready? A scoring rubric
Give the same coding agent two repositories and you’ll get two completely different experiences. In one, it lands a clean, tested change in minutes. In the other, it flails — editing the wrong module, breaking things it can’t see, burning tokens re-discovering context that should have been written down.
The difference isn’t the agent. It’s the codebase. Here’s the rubric we use in a readiness audit, the same dimensions we score in a written report.
1. Module boundaries
Agents work best when a task maps to a bounded region of code. If changing one behavior means touching nine files across the repo with no obvious seam, an agent will miss some of them — and so will a human.
Score high if: clear module/package boundaries, explicit public interfaces, and low coupling. Score low if: a big ball of mud where everything imports everything.
2. Test coverage — and test speed
Tests are how an agent knows it didn’t break anything. No tests means no feedback loop, which means an agent (or a junior with an agent) ships confidently broken code. Slow tests are nearly as bad: if the suite takes 20 minutes, the agent can’t iterate.
Score high if: meaningful coverage on critical paths and a fast, runnable suite. Score low if: coverage is decorative or the suite takes forever.
3. Type strictness
Types are free, machine-checked documentation. In a strict, well-typed codebase an agent gets immediate signal when it misuses an API. In an untyped or loosely-typed one, errors only surface at runtime — too late for the agent to self-correct.
Score high if: strict mode on, few escape hatches. Score low if: types
are advisory and any is everywhere.
4. Documentation and context files
This is where most codebases lose the most points, and where remediation is
cheapest. Agents read context. A good CLAUDE.md / rules file, accurate READMEs,
and architecture notes mean the agent starts every task already oriented.
Without them it re-derives your conventions from scratch every time — slowly,
and often wrongly.
Score high if: there’s a maintained context file and the docs match reality. Score low if: the only documentation is the code, and the tribal knowledge is in people’s heads.
5. Spec coverage
Agents are dramatically more reliable when the task is specified. Codebases that capture intent — specs, ADRs, well-written issues — give agents a target. Codebases where requirements live only in Slack threads force the agent to guess.
6. MCP integration potential
How much of the work needs systems outside the repo — the database, the deploy pipeline, Datadog, Linear? If agents can reach those safely over the Model Context Protocol, whole classes of task open up. If not, the agent is boxed into the text of the repo.
Scoring it
We score each dimension and weight by impact. The output isn’t a vanity grade — it’s a prioritized roadmap. Usually the cheapest, highest-leverage fixes are the same every time:
- Write the context file. (Hours of work, compounding returns.)
- Make the test suite fast and runnable. (Unlocks the agent feedback loop.)
- Turn on strict types and fix the top offenders. (Free guardrails.)
The expensive structural work — splitting the monolith, backfilling specs — comes later, and only where the score says it’ll actually pay off.
Why quantify it at all?
Because “our codebase isn’t ready” is a feeling, and feelings don’t get budget. A score does. It turns an anxious hunch into a plan with a sequence and an estimate, and it tells you which AI-coding workflows to standardize now versus which to wait on.
If you want yours scored, that’s exactly what a readiness audit delivers: a written report and a roadmap, in two to three weeks.