LLM Integration

ShipLens uses Anthropic's Claude models for all AI-powered features. The integration is designed for cost efficiency, transparency, and reliability.

Model Selection

Model	Used For	Cost (per 1M tokens)
Claude Haiku	Standard commit analysis, triage context	Input: $0.80 / Output: $4.00
Claude Sonnet	Deep analysis, weekly digests, 1:1 reports	Input: $3.00 / Output: $15.00

Why Two Models?

The dual-model approach balances cost and quality:

Haiku handles ~70-80% of commits (standard depth) at very low cost. For a typical commit analysis, this costs $0.001–0.003.
Sonnet is reserved for tasks where quality matters most: security-sensitive code, generated reports, and agentic deep analysis.

This typically results in a 10–20× cost reduction compared to using Sonnet for everything.

Integration Architecture

Client Abstraction

The LLM client (llm/client.ex) provides two main functions:

Function	Description
`complete(prompt, model)`	Standard prompt → response
`complete_with_tools(prompt, tools, model)`	Prompt with tool definitions for agentic use

Both return a Response struct containing:

content — The LLM's response text
model — Which model was used
input_tokens / output_tokens — Token counts
cost_usd — Calculated cost

Cost Tracking

Every LLM call has its cost tracked and stored:

Commit reports store cost_usd per analysis
Weekly digests store cost_usd per generation
1:1 reports store cost_usd per report

This enables full cost visibility and budgeting.

Prompt Engineering

All prompts are centralized in llm/prompts.ex and share common design principles:

Structured JSON Output

Every analysis prompt requests a specific JSON schema:

json

{
  "commit_type": "feat",
  "summary": "...",
  "areas_affected": ["auth", "api"],
  "complexity": 3,
  "impact": 4,
  "quality_signals": ["has_tests", "clean_patterns"],
  "risk_signals": ["touches_auth"],
  "slop_dimensions": {
    "verbosity": 1,
    "unnecessary_comments": 0,
    "over_engineering": 0,
    "defensive_bloat": 0,
    "style_mismatch": 0,
    "redundancy": 0,
    "scope_creep": 0
  },
  "confidence": 0.85
}

Project Context Injection

Standard analysis prompts include project context from the vector store:

You are analyzing a commit in the context of this project:
[2000 chars of relevant project context from RAG]

Commit: [sha]
Message: [message]
Diff: [compressed diff]

Agentic Deep Analysis

Deep analysis prompts give the LLM access to codebase tools:

Tool	Capability
`read_file(path)`	Read any file in the repository
`search_codebase(query)`	Search for patterns across files
`list_directory(path)`	List directory contents
`get_file_at_commit(path, sha)`	Read a file at a specific commit

The agentic loop runs for up to 10 turns with a $0.50 cost cap per commit.

Self-Reflection

After initial analysis, a self-reflection step can refine the LLM's assessment:

Reconsider confidence level
Verify gaming flag appropriateness
Check for overlooked quality/risk signals

Error Handling

Scenario	Behavior
API timeout	Retry with exponential backoff (Oban handles this)
Rate limit	Job re-queued for later processing
Invalid JSON response	Parsed best-effort; fallback to heuristic scoring
Cost cap exceeded (deep)	Analysis stops, returns partial results

Typical Costs

Operation	Per Unit	Monthly (10-person team)
Standard commit analysis	$0.001–0.003	$2–15
Deep commit analysis	$0.01–0.50	$1–10
Weekly digest	$0.003–0.01	$0.10–0.40
1:1 report	$0.01–0.05	$1–5
Total		$4–30/month

LLM Integration ​

Model Selection ​

Why Two Models? ​

Integration Architecture ​

Client Abstraction ​

Cost Tracking ​

Prompt Engineering ​

Structured JSON Output ​

Project Context Injection ​

Agentic Deep Analysis ​

Self-Reflection ​

Error Handling ​

Typical Costs ​