Project Context & Domains

ShipLens doesn't analyze commits in a vacuum. Before any commit analysis happens, it builds a deep understanding of your project's structure, conventions, and critical domains.

Project Understanding Phase

When a project is connected, ShipLens runs a one-time understanding phase:

1. Crawl

The repository structure is crawled to map:

Directory hierarchy
File types and distribution
Key configuration files
Module boundaries

2. Sample

Representative files are selected from each domain/directory to understand:

Coding patterns and conventions
Architectural style
Language idioms
Framework usage

3. Domain Map

Domains (logical areas of the codebase) are identified and assigned criticality levels:

Criticality	Description	Examples
Critical	Core business logic, security	Authentication, payments, data encryption
High	Important features, data access	API controllers, database layer, user management
Medium	Standard features	UI components, utilities, middlewares
Low	Supporting code	Logging, formatting, configuration
Trivial	Non-functional	Documentation, CI config, style files

Domain criticality can be:

Auto-detected by the LLM based on code content and naming patterns
CTO-overridden to reflect business knowledge the LLM can't infer

4. Embed & Index

Project knowledge is embedded using the all-MiniLM-L6-v2 model (384-dimensional vectors) and stored in PostgreSQL via pgvector.

This creates a searchable vector store that the commit analysis LLM can query for relevant context.

How Context Is Used in Analysis

During commit analysis, the pipeline:

Takes the commit's affected files and areas
Retrieves relevant project context from the vector store (capped at 2,000 characters)
Includes this context in the LLM prompt

This means the LLM knows:

What the changed module does in the broader architecture
What conventions the project follows
How critical the affected domain is
What related code looks like

Context Cap

Project context sent to the LLM is capped at 2,000 characters for standard analysis. This is intentional:

Keeps LLM costs predictable
Prevents context window bloat
Forces context to be the most relevant snippets
Deep analysis (Sonnet agentic mode) can explore further with tools

Domain Criticality in Scoring

Domain criticality directly affects commit scores through two paths:

Impact estimation — When the LLM doesn't provide an explicit impact score, domain criticality is used as a proxy (critical = 4, high = 3, medium = 2, low = 1)
V1 domain multipliers — In the legacy scoring engine, domain criticality applies a direct multiplier to the score (critical = 1.5×, trivial = 0.3×)
LLM awareness — The LLM sees domain criticality in its context and factors it into complexity and impact assessments

Local Embeddings

All embeddings run locally using Bumblebee (Elixir ML framework):

Property	Value
Model	all-MiniLM-L6-v2
Dimensions	384
Runtime	Local (no API calls)
Storage	pgvector extension in PostgreSQL

Benefits:

Zero embedding cost — no external API charges
Data sovereignty — project code never leaves your infrastructure for embeddings
Low latency — no network round-trip for context retrieval
Adequate quality — 384-dim vectors provide good semantic similarity for code search

Project Context & Domains ​

Project Understanding Phase ​

1. Crawl ​

2. Sample ​

3. Domain Map ​

4. Embed & Index ​

How Context Is Used in Analysis ​

Context Cap ​

Domain Criticality in Scoring ​

Local Embeddings ​