Project Context & Domains
ShipLens doesn't analyze commits in a vacuum. Before any commit analysis happens, it builds a deep understanding of your project's structure, conventions, and critical domains.
Project Understanding Phase
When a project is connected, ShipLens runs a one-time understanding phase:
1. Crawl
The repository structure is crawled to map:
- Directory hierarchy
- File types and distribution
- Key configuration files
- Module boundaries
2. Sample
Representative files are selected from each domain/directory to understand:
- Coding patterns and conventions
- Architectural style
- Language idioms
- Framework usage
3. Domain Map
Domains (logical areas of the codebase) are identified and assigned criticality levels:
| Criticality | Description | Examples |
|---|---|---|
| Critical | Core business logic, security | Authentication, payments, data encryption |
| High | Important features, data access | API controllers, database layer, user management |
| Medium | Standard features | UI components, utilities, middlewares |
| Low | Supporting code | Logging, formatting, configuration |
| Trivial | Non-functional | Documentation, CI config, style files |
Domain criticality can be:
- Auto-detected by the LLM based on code content and naming patterns
- CTO-overridden to reflect business knowledge the LLM can't infer
4. Embed & Index
Project knowledge is embedded using the all-MiniLM-L6-v2 model (384-dimensional vectors) and stored in PostgreSQL via pgvector.
This creates a searchable vector store that the commit analysis LLM can query for relevant context.
How Context Is Used in Analysis
During commit analysis, the pipeline:
- Takes the commit's affected files and areas
- Retrieves relevant project context from the vector store (capped at 2,000 characters)
- Includes this context in the LLM prompt
This means the LLM knows:
- What the changed module does in the broader architecture
- What conventions the project follows
- How critical the affected domain is
- What related code looks like
Context Cap
Project context sent to the LLM is capped at 2,000 characters for standard analysis. This is intentional:
- Keeps LLM costs predictable
- Prevents context window bloat
- Forces context to be the most relevant snippets
- Deep analysis (Sonnet agentic mode) can explore further with tools
Domain Criticality in Scoring
Domain criticality directly affects commit scores through two paths:
Impact estimation — When the LLM doesn't provide an explicit impact score, domain criticality is used as a proxy (critical = 4, high = 3, medium = 2, low = 1)
V1 domain multipliers — In the legacy scoring engine, domain criticality applies a direct multiplier to the score (critical = 1.5×, trivial = 0.3×)
LLM awareness — The LLM sees domain criticality in its context and factors it into complexity and impact assessments
Local Embeddings
All embeddings run locally using Bumblebee (Elixir ML framework):
| Property | Value |
|---|---|
| Model | all-MiniLM-L6-v2 |
| Dimensions | 384 |
| Runtime | Local (no API calls) |
| Storage | pgvector extension in PostgreSQL |
Benefits:
- Zero embedding cost — no external API charges
- Data sovereignty — project code never leaves your infrastructure for embeddings
- Low latency — no network round-trip for context retrieval
- Adequate quality — 384-dim vectors provide good semantic similarity for code search
