Skip to content

Project Context & Domains

ShipLens doesn't analyze commits in a vacuum. Before any commit analysis happens, it builds a deep understanding of your project's structure, conventions, and critical domains.

Project Understanding Phase

When a project is connected, ShipLens runs a one-time understanding phase:

1. Crawl

The repository structure is crawled to map:

  • Directory hierarchy
  • File types and distribution
  • Key configuration files
  • Module boundaries

2. Sample

Representative files are selected from each domain/directory to understand:

  • Coding patterns and conventions
  • Architectural style
  • Language idioms
  • Framework usage

3. Domain Map

Domains (logical areas of the codebase) are identified and assigned criticality levels:

CriticalityDescriptionExamples
CriticalCore business logic, securityAuthentication, payments, data encryption
HighImportant features, data accessAPI controllers, database layer, user management
MediumStandard featuresUI components, utilities, middlewares
LowSupporting codeLogging, formatting, configuration
TrivialNon-functionalDocumentation, CI config, style files

Domain criticality can be:

  • Auto-detected by the LLM based on code content and naming patterns
  • CTO-overridden to reflect business knowledge the LLM can't infer

4. Embed & Index

Project knowledge is embedded using the all-MiniLM-L6-v2 model (384-dimensional vectors) and stored in PostgreSQL via pgvector.

This creates a searchable vector store that the commit analysis LLM can query for relevant context.

How Context Is Used in Analysis

During commit analysis, the pipeline:

  1. Takes the commit's affected files and areas
  2. Retrieves relevant project context from the vector store (capped at 2,000 characters)
  3. Includes this context in the LLM prompt

This means the LLM knows:

  • What the changed module does in the broader architecture
  • What conventions the project follows
  • How critical the affected domain is
  • What related code looks like

Context Cap

Project context sent to the LLM is capped at 2,000 characters for standard analysis. This is intentional:

  • Keeps LLM costs predictable
  • Prevents context window bloat
  • Forces context to be the most relevant snippets
  • Deep analysis (Sonnet agentic mode) can explore further with tools

Domain Criticality in Scoring

Domain criticality directly affects commit scores through two paths:

  1. Impact estimation — When the LLM doesn't provide an explicit impact score, domain criticality is used as a proxy (critical = 4, high = 3, medium = 2, low = 1)

  2. V1 domain multipliers — In the legacy scoring engine, domain criticality applies a direct multiplier to the score (critical = 1.5×, trivial = 0.3×)

  3. LLM awareness — The LLM sees domain criticality in its context and factors it into complexity and impact assessments

Local Embeddings

All embeddings run locally using Bumblebee (Elixir ML framework):

PropertyValue
Modelall-MiniLM-L6-v2
Dimensions384
RuntimeLocal (no API calls)
Storagepgvector extension in PostgreSQL

Benefits:

  • Zero embedding cost — no external API charges
  • Data sovereignty — project code never leaves your infrastructure for embeddings
  • Low latency — no network round-trip for context retrieval
  • Adequate quality — 384-dim vectors provide good semantic similarity for code search

Built with intelligence, not surveillance.