# Context7 Public-Docs Reference Arm

This note summarizes the Context7 comparison used by the Java R&D demo. It is a
public-docs reference arm, not a broad Context7 product-quality ranking.

## What Was Compared

The benchmark used the same task catalog and deterministic patch scorer across
three arms:

| Arm | Context available to the coding agent | Guardrail-ablated accepted patches | Full-rubric accepted patches |
| --- | --- | ---: | ---: |
| K2 MCP | Generated guide rules, versioned docs, selected source files, and neighboring tests through K2 collections, agents, and MCP retrieval. | 98 / 100 | 96 / 100 |
| Repo-only baseline | Local checkout, local search, and model context. No external MCP retrieval. | 96 / 100 | 31 / 100 |
| Context7 public-docs MCP | Public library documentation through Context7. No private guide corpus, source corpus, or neighboring test corpus. | 52 / 100 | 24 / 100 |

## How To Interpret It

The Context7 arm answers a narrow question: whether public documentation alone
solves this project-specific Java feature-development task set. It should not
be read as a general negative claim about Context7, and it does not evaluate
Context7 private deployments.

The Context7 arm used more agent tokens than the repo-only baseline because the
public-docs MCP tool could return additional documentation context that the
local-only arm did not have. That is a context-scope result, not evidence that
Context7 degrades agents; wider public documentation alone did not supply the
private guide, source, and neighboring-test evidence required by this task set.

The result is consistent with the demo thesis: for proprietary Java
development workflows, public docs are useful but incomplete. The coding agent
needs the team's guide rules, local source patterns, neighboring tests, and
version-specific docs in one governed context path.

## Pilot Recommendation

For a customer pilot, compare K2 against the customer's real current workflow,
including any existing search, RAG, IDE assistant, Context7 setup, or internal
documentation tool. Freeze tasks and scoring rules before running the pilot, and
report accepted patches, reviewer rework, focused tests, wall-clock time, token
use, K2 retrieval cost, and source citations.
