November 2025

Technical Memo

1. Core capabilities

1.1. Architectural understanding:

Most doc tools still rely on technical writers and developers to translate the complexity of code into natural language. To enable a self-maintaining system, we start by truly understanding system architecture through:

Distinguishing user-facing services from internal implementations and utilities
Dependency mapping within the codebase and across an organization's repos
Change log classification that identifies whether changes are breaking, significant, incremental, or cosmetic as a part of the product

Not every code change requires a documentation update, and not every documentation need can be detected from code alone. Renaming an internal variable or refactoring a helper function doesn't need doc changes. But changing a public API parameter, modifying request/response schemas, or altering authentication flows does.

1.2. Bidirectional code-doc understanding:

Traditional tools work in one direction. We map both paths:

From code to docs: We identify which docs reference specific functions, APIs, or patterns through entity-level tracking (not just file-level) and use community detection to understand related functionality clusters.

From docs to code: We identify which code implements documented behaviors, validate that examples and specifications match reality, and detect documented features that no longer exist. This is enabled by structural documentation analysis that performs:

Intent recognition: We understand what each doc page aims to accomplish and what it should capture in an ideal world
Coverage calculation: We identify gaps between what docs promise and what they deliver
Audience understanding: We use the docs themselves to understand the intended reader (developer, operator, end-user)

When you add a new retry_policy parameter to an API client, we find all docs that show example usage of that client. When documentation promises three types of signature validations, we verify the code hasn't evolved since or offers more/less - and flag if the algorithm differs from what's documented.

1.3. Multi-source context expansion:

Curated documentation cannot be created just from code. For example, design decisions that are crucial to understanding 'why' the system evolved in a certain way, is not documented anywhere and doesn't live in git. To fill such gaps, we aggregate context from:

Conversation archives: Slack/Teams channels where design decisions occur
Meeting recordings: video transcripts from architecture reviews and design sessions
Issue trackers: Support tickets and bug reports revealing user pain points
Community forums: User questions and feedback often point to unclear documentation

The 'why' behind code changes rarely lives in commit messages or even PRs. Context is distributed across systems and modalities; we centralize it. Code shows you implemented rate limiting with exponential backoff, that's the 'what'. The 'why', that it was chosen after a production incident where linear backoff caused cascading failures, lives in a Slack thread, a postmortem doc, and a Zoom recording.

Similarly, when an engineer mentions in Slack "we're deprecating OAuth1 because of security concerns raised in the Q3 review", that context enriches the documentation explaining the migration path to OAuth2. We capture both the 'what' and the 'why' to create documentation that actually helps teams make decisions.

1.4. Intelligent prioritization:

Once gaps are identified, we prioritize them based on:

Git activity: Frequently changing code needs fresher docs
Issue signals: Docs/code components resulting in high support ticket volume
Community questions: What users repeatedly ask in forums
Competitive intelligence: Gaps where competitor docs excel
Dependency based: Core components that are leveraged across services (e.g. auth) have higher user impact

Teams have limited bandwidth. If your authentication module changed 47 times last quarter and generates 30% of support tickets, while a rarely used admin utility hasn't been touched in a year, we'll flag the auth docs as high priority. We direct effort to highest-ROI documentation work.

1.5. Automated differential updates:

We generate the fix for the identified documentation gaps through:

Contextual suggestions: Generating doc updates that maintain existing style and voice
Confidence scoring: LLM-generated suggestions include reliability metrics
Minimal verbosity: Avoiding generating AI slop and rather making targeted updates

When a function signature changes from create_user(email, name) to create_user(email, name, role="user"), we don't just detect the diff. We generate suggestions for the quickstart guide ("Add the optional role parameter"), the API reference (update the parameter table), and the migration guide (explain the backward-compatible default). Documentation maintenance becomes a background process.

Approval workflow: All suggestions surface for human review and can be further automated for specific workflows with high acceptance rate over time.

2. Architectural principles

2.1. Algorithmic first, LLM last:

As codebases and documentation expand and component interdependencies grow, purely LLM-based approaches become computationally intractable. A codebase with 5000+ files and interconnected documentation across multiple services creates a context space that LLMs cannot reliably navigate without losing nuance.

We avoid the 'AI slop' problem by making graph analysis, pattern matching, and statistical methods do the heavy lifting. LLMs validate and generate only after algorithmic filtering establishes deterministic, verifiable results. This means our core intelligence capabilities produce explainable outputs that don't hallucinate or drift.

Our algorithmic foundation parses, preprocesses, and filters, ensuring only what truly requires LLM attention gets it. When LLMs do generate suggestions, they include confidence scoring and always surface through human approval workflows. You're never trusting unreliable AI output to reach your documentation without review.

Result: This results in a significant cost reduction even for large codebases vs. naive LLM-everything approaches, with higher accuracy and deterministic results.

2.2. Understanding over indexing:

Instead of indexing blindly, we have built semantic models using topic vectors across documentation, function-level call graphs and dependency networks, community detection (using clustering techniques) for related code, and hashing for near-duplicate detection across repos.

2.3. Cross-service knowledge layer:

Documentation silos fragment organizational knowledge. We build a unified layer that maintains a single source of truth with consistent understanding across microservices, libraries, and tools.

The system is dependency-aware, understanding how services interact and where docs must align, and version-conscious, tracking documentation across API versions and deployment environments.

Future direction: Evolves into a queryable context layer - maintaining your entire product knowledge graph, always current.

3. Differentiated approach

Traditional Tools	Our Approach
Generate docs from scratch	Update existing docs with intelligent prioritization
File-level matching	Entity and intent-level understanding
Static analysis	Continuous background process
Single source (code or docs)	Multi-source context aggregation
Equal priority for everything	Smart prioritization by impact
Detection only	Detection + generation + updates
Per-repo silo	Cross-service knowledge layer

4. Positioning in your stack

We're complementary to your existing tools, not a replacement:

Knowledge aggregators (Guru, Glean): They make information discoverable through search. We ensure that information stays accurate and current through automated detection and updates.
Documentation platforms (Confluence, Notion): They provide the interface for documentation. We automatically update content within them, detecting when docs need refreshing based on code, system, and contextual changes.
Code documentation generators (auto-README tools): They explain 'what' code does at a point in time. We maintain the 'why' behind decisions and keep all documentation synchronized as systems evolve.

5. Integration model

5.1. Non-invasive:

Works with existing repos with no restructuring required. Plugs into your codebase using GitHub actions, and we handle the heavy lifting in the background.

5.2. Analysis modes:

Cold start (initial repository analysis)

The first run performs deep analysis of your entire codebase and documentation corpus, building the foundational semantic models, dependency graphs, and entity mappings. This establishes the baseline understanding of your architecture and documentation structure. This also identifies all existing gaps and fixes, prioritized based on various parameters including your audience's needs.

Warm start (incremental updates)

Subsequent runs leverage stored representations and focus only on changed files and their dependency neighborhoods. This makes everyday analysis fast and cost-effective.

Per-release analysis

Run comprehensive checks when cutting releases to ensure all customer-facing documentation reflects the release state. Generates a prioritized list of doc updates needed before shipping.

Everyday analysis

Continuous or nightly runs that catch documentation drift early. Integrates into your regular development workflow with minimal overhead.

Triggered analysis

Event-driven runs such as manual triggers provides just-in-time feedback when developers touch high-impact code.

5.3. Configurable:

Tune prioritization weights for your domain (eg, weight API documentation more heavily than internal utils). Control LLM usage vs. cost tradeoffs based on your budget and accuracy needs. Select which data sources to include (start with code + docs, expand to Slack, issues, meetings).

6. Measurable outcomes

Reduction in documentation lag: Automated detection and updates shift documentation maintenance from reactive sprint work to continuous background process
Developer efficiency: Context-aware suggestions reduce time spent on doc review and updates per PR
Support deflection: Proactive documentation accuracy reduces tickets related to outdated or incorrect documentation
Knowledge retention: Capture design context that would otherwise live only in people's heads
Reduced integration failures: Cross-service dependency tracking prevents breaking changes and maintains documentation alignment across services
Improved onboarding: New hires ramp up faster with accurate, comprehensive documentation that reflects current system state and captures institutional knowledge

7. What we're building toward

Beyond a documentation tool, we are building the infrastructure for knowledge continuity. Your codebase is the system of record, your documentation is the interface layer, and context from conversations, decisions, and usage becomes structured knowledge, everything stays synchronized automatically. As your product evolves across services, teams, and time, documentation evolves with it.

Protected content