K² Coding Context Demo: Java R&D Case Study

K² ships scoped context for coding agents

Coding agents need source-role-separated context before they edit. This case study shows how your coding agent, including Codex, Claude Code, or Cursor, can use K² corpora, named agents, a Knowledge Feed, and a pipeline to retrieve private guides, source, tests, architecture notes, and guardrails with citations. The benchmark is supporting evidence; the reusable architecture is the product.

Core K² primitives: collectionsK² docs concept: a collection is a separately indexed corpus. The demo keeps guide rules, versioned docs, and Java source/tests in different collections so filters can preserve source role., AgentsK² docs concept: an Agent is a named retrieval/synthesis worker with bounded instructions and corpus access. The demo uses guide, docs, code, and architect agents., Knowledge FeedK² docs concept: a Knowledge Feed moves repeated findings from a source agent into a target corpus. The demo feed promotes recurring REST handler source findings into guide material., and PipelineK² docs concept: a Pipeline declares how corpora, agents, feeds, and subscriptions connect so the topology can be inspected and re-run..

Architecture first

K² connects your coding agent to scoped Java evidence.

Inspect the topology first: private knowledge is indexed by role, routed through bounded K² agents, exposed over MCP, and consumed by Codex, Claude Code, Cursor, or another coding agent before it edits Java. The benchmark below is supporting evidence; the architecture is the part a customer would replicate.

Customer knowledgeGuides, docs, code, tests
K² platformCollections, Agents, Feed, Pipeline
MCP serverScoped evidence with citations
Coding agentPlans, edits, and explains
Java patchFocused tests and review trail
Source-role-separated context architecture

K² keeps guides, docs, code, and tests separated, asks bounded agents in a declared order, and lets a Knowledge Feed promote repeated source findings back into durable guidance.

K² coding context architectureCustomer corporaGuidesDocsCodeTestsNamed K² agentsGuide AgentDocs AgentCode AgentTest AnchorsArchitect agentGuidesDocsCode + testsPipeline + MCPPipeline SpecMCP serverCoding agentKnowledge Feed: recurring code findings become guide material
The core insight: source-role-separated context

Coding agents need to know whether retrieved text is a rule, an API contract, implementation precedent, or a test expectation. K² preserves that role through collections, metadata filters, named agents, and citations instead of flattening every source into one undifferentiated prompt.

Guides set constraintsConfluence guardrails, review rules, migration notes, and team conventions.
Docs set API contractsVersion-pinned public or private documentation for framework behavior.
Code shows precedentClasses, packages, handlers, message types, and neighboring implementation patterns.
Tests define acceptanceFocused checks and reviewable verification commands before a patch is claimed.
What scoped retrieval looks like in a customer demo

Before a coding agent edits Java, K² should answer operational questions like these with citations from the customer's private guides, source, tests, and architecture notes.

Ask K²How do we add a controller in module X?
Ask K²Which DTO pattern should this endpoint use?
Ask K²Which neighboring implementation should we mirror?
Ask K²Which test should be extended?
Ask K²Which Confluence rule applies before the coding agent edits?
K² platform value map

Each card maps a K² feature to the development value shown in this demo.

CollectionsK² docs concept: a collection is a separately indexed corpus. The demo keeps guide rules, versioned docs, and Java source/tests in different collections so filters can preserve source role.

Separate corpora for guides, docs, and code

K² keeps Confluence-style guidance, versioned documentation, source, and tests queryable without mixing their roles.

  • Demo assets: java-rd-guides, flink-docs-2.2, flink-code-2.2.
  • Customer value: the agent knows whether evidence is a rule, API doc, implementation, or test.
Metadata filtersK² docs concept: metadata filters constrain retrieval by fields such as framework, version, source_kind, module, package, class_name, and path before your coding agent receives context.

Precise retrieval for legacy Java

Framework, version, source kind, module, package, class, API surface, and path metadata keep answers scoped.

  • Demo filter examples target Flink 2.2 REST handlers and tests.
  • Customer value: fewer irrelevant snippets and less prompt waste.
Hybrid searchK² docs concept: hybrid retrieval combines dense semantic search with sparse exact matching. That matters for Java because class names and method names must be found exactly.

Semantic questions plus exact symbols

Dense retrieval helps with conceptual questions, while sparse matching protects exact Java class and method names.

  • Demo symbols: JobVertexWatermarksHandler, MessageQueryParameter.
  • Customer value: code references survive even when names are obscure.
AgentsK² docs concept: an Agent is a named retrieval/synthesis worker with bounded instructions and corpus access. The demo uses guide, docs, code, and architect agents.

Specialized context workers

Guide, docs, code, and architect agents each answer a bounded part of the development question.

  • Demo agents return guardrails, release docs, source/test anchors, and a cited plan.
  • Customer value: the coding agent receives structured context instead of a pile of text.
Knowledge FeedK² docs concept: a Knowledge Feed moves repeated findings from a source agent into a target corpus. The demo feed promotes recurring REST handler source findings into guide material.

Turn repeated discoveries into durable guidance

The feed loop can promote recurring source findings back into the guide corpus for future sessions.

  • Demo feed: Flink REST Guide Feed from code agent to java-rd-guides.
  • Customer value: institutional knowledge improves instead of being rediscovered every time.
Pipeline + MCPK² docs concept: a Pipeline declares how corpora, agents, feeds, and subscriptions connect so the topology can be inspected and re-run.

Auditable route from K² to your coding agent

The Pipeline Spec shows the topology; MCP makes the same coding-agent workflow use K² context before editing.

  • Demo result: answer excerpts, code diff excerpts, and verification artifacts are linked.
  • Customer value: reviewers can inspect what evidence influenced the patch.
Benchmark evidence, led by the ablated control

On the dimensions that exclude guide-compliance scoring, K² is narrowly ahead of the repo-only baseline and materially ahead of public-docs-only context. The full rubric then shows the additional lift from retrieving guide rules that the baseline does not have.

98 / 100K² MCP guardrail-ablated accepted patches
96 / 100repo-only baseline guardrail-ablated accepted patches
52 / 100Context7 public-docs MCP guardrail-ablated accepted patches
+65 ptsfull-rubric guide-compliance lift versus repo-only baseline

Full-rubric accepted patches were K² 96/100, repo-only baseline 31/100, and Context7 public-docs MCP 24/100. Read that gap as a guide-retrieval result: K² retrieved the same Confluence-style guardrails that the full scorer checks.

Token math is agent-side: retrieved snippets count once they enter the coding-agent prompt, but K² platform retrieval, ingestion, storage, and subscription costs are reported separately below. Do not quote this as a broad Context7 ranking or expected customer outcome.

Methodology and ablation

The circularity risk is explicit: K² retrieves guide rules and the full rubric rewards guide compliance. The table therefore reports the full score beside a guardrail-ablated pass rate, where guide-compliance failures are removed from pass/fail attribution.

Scoring rubric

ComponentWeight
Focused tests + build verification40%
Expected files/modules touched25%
Required behavior or diff-pattern checks15%
Confluence/internal guide compliance10%
Review scope and safety10%

Guardrail-ablated versus full pass rate

ArmGuardrail-ablated scoreFull score
K² MCPProject guides, source, tests, and versioned docs through K².98 / 10096 / 100
Repo-only baselineLocal checkout and model memory without an external context service.96 / 10031 / 100
Context7 public-docs MCPPublic documentation through Context7, without private guide/source/test corpora.52 / 10024 / 100

Authorship/freeze disclosure

The public artifact does not independently prove that task authors, guide authors, and scorer authors were blind to K² outputs before freezing. The defensible public claim is therefore narrower: K² improved this guide-retrieval-heavy benchmark, and customer-specific claims require a frozen customer replication before indexing or running either arm.

Cost model

  • Agent-token numbers count prompt and completion tokens captured by the benchmark runner. Retrieved K² snippets are included once they enter the agent prompt.
  • K² platform cost is not hidden in the token-savings number. It includes ingestion, retrieval queries, storage, and subscription.
  • Illustrative benchmark-scale platform allocation: Pro tier at $249/month for this demo corpus and run.
  • If full-rubric accepted patches are the customer-relevant outcome because guide violations create review rework, K² platform allocation is $249 / 96 = $2.59 per full-rubric accepted patch before model-token cost.
  • If the guardrail-ablated frame is used as the raw code-quality denominator, K² adds 2 incremental ablated accepted patches over the repo-only baseline, or about $124.50 per incremental ablated patch before model-token effects.
  • That is the point of reporting both frames: the public run shows a narrow ablated code-quality lead and a large guide-compliance/review-rework lead.
  • Break-even against the repo-only baseline on the full-rubric outcome occurs when blended agent-token price exceeds roughly $0.51 per million tokens saved.
  • At Claude Sonnet-style input pricing around $3 per million tokens, the 5.04M agent-token savings per full-rubric accepted patch is about $15.12, roughly 5.8x the benchmark-scale K² platform allocation.
  • These numbers do not include developer time. One avoided review or re-prompt hour at a $150 loaded engineering cost dwarfs the token and platform costs combined.

Full-rubric formula: K² cost per accepted patch = 1.55M agent tokens times the model-token rate plus $2.59 platform allocation; repo-only baseline = 6.59M agent tokens times the model-token rate.

The economic case should also count developer time. If K² saves even one review or re-prompt hour per accepted patch, the labor savings exceed both token cost and benchmark-scale platform allocation by a large multiple.

Reproducibility statement

Task definitions, scorer configurations, prompt templates, selected raw responses, patch artifacts, and demo asset manifests are published in the repository. An external reviewer can rerun the public benchmark with the same coding-agent model, a K² API key, and the published bundle; customer-specific claims still require customer-frozen tasks and customer-owned corpora.

Two ways to act on the demo

Developers need a short reproducible setup. Enterprise buyers need a controlled pilot against their current coding-agent workflow. The same architecture supports both paths.

Pilot ask

The customer-facing claim should be earned on customer assets. Use this public benchmark to scope a bounded replication, not as a forecast for a financial Java application.

Enterprise path

Run the same workflow on customer assets.

  • Pick one representative Java module and one Confluence page tree.
  • Freeze 5-10 real feature-development tasks before the pilot run.
  • Run the same coding agent, for example Codex or Claude Code, with and without K² retrieval.
  • Score accepted patches, review rework, focused tests, token use, wall-clock time, and retrieval cost.
Design partners wanted

Validate on customer code before making any customer-specific claim.

  • This is a public benchmark and demo bundle, not a named customer replication.
  • Design partners should freeze tasks, expected files, guide checks, and scorer logic before indexing.
  • Publish customer-approved relevance findings, even if the customer name remains anonymized.
K² for coding agents, not only Java

Java R&D is the inaugural case study because regulated teams often have the richest guide and compliance corpus. The same collections, agents, feed, pipeline, and MCP pattern is language-agnostic.

Available nowJava R&D case studyRegulated enterprise services with Confluence-heavy guardrails.
Planned Q3 2026TypeScript case studyInternal component library and product-platform conventions.
Planned Q3 2026Python case studyData/ML platform workflows with notebooks, pipelines, and model-serving code.