Architecture

ShipLens is built on Elixir/OTP with Phoenix LiveView, designed for concurrency, fault tolerance, and real-time updates.

Tech Stack

Layer	Technology	Why
Language	Elixir (OTP)	Concurrency, fault tolerance, Broadway pipelines
Database	PostgreSQL + pgvector	Relational data + vector similarity for RAG
LLM	Claude Haiku & Sonnet	Haiku for cost-efficient analysis, Sonnet for deep/generative
Embeddings	Bumblebee (all-MiniLM-L6-v2)	Local inference, 384-dimensional vectors, no external API
Frontend	Phoenix LiveView	Real-time UI without a separate SPA
Job Queue	Oban	Reliable background processing with PostgreSQL-backed queues
Dev/Deploy	Nix	Reproducible environments via `flake.nix` and `direnv`

Multi-Tenant Structure

Company is the tenant root. Everything — teams, contributors, projects, reports — belongs to a company. Companies have a plan field (free, starter, business, enterprise) for future billing.

Teams are sub-groups within a company. A contributor belongs to one team. Weekly digests can be generated company-wide or per-team.

Contributors have multiple identities (git emails) that map to a single person. This handles the common case of developers using different email addresses across repositories.

Data Flow

Key Design Decisions

Why Separate Analysis from Scoring?

LLM analysis is expensive and slow. Scoring is cheap and instant. By separating them:

Analysis runs once per commit (cost: ~$0.001–0.01 per commit)
Scoring can be re-run unlimited times with different configurations
CTOs can experiment with weights without burning API credits
Historical re-scoring takes seconds, not hours

Why Deterministic Triage?

Before calling any LLM, every commit passes through rule-based triage. This saves significant cost:

Empty merge commits → shallow (no LLM)
CI/CD-only changes → shallow (no LLM)
Docs-only changes → shallow (no LLM)
Security-related → deep (Sonnet, more expensive but warranted)
Everything else → standard (Haiku, cost-efficient)

Roughly 20-30% of commits in a typical repository are shallow, saving that percentage of LLM costs entirely.

Why Local Embeddings?

We use Bumblebee with the all-MiniLM-L6-v2 model (384 dimensions) for embeddings instead of an external API:

No additional API cost — embeddings run locally
No data leaves your infrastructure — project context stays local
Low latency — no network round-trip
Good enough quality — 384-dim vectors provide strong semantic similarity for code context retrieval

Why Oban for Job Processing?

Oban uses PostgreSQL as its job queue backend, which means:

No additional infrastructure (Redis, RabbitMQ, etc.)
Transactional job creation — jobs are created in the same transaction as related data
Built-in retry, dead-letter queues, and scheduling
Job introspection through SQL queries

Background Jobs

Job	Purpose	Queue
`CommitWorker`	Process individual commit (triage → analysis)	`commits`
`DigestGenerateJob`	Generate weekly digest content via Sonnet	`digests`
`SilentContributorJob`	Detect inactive contributors	`alerts`
`CommitPatternAlertJob`	Detect high fix ratio, low test discipline	`alerts`
`RefreshTokenWorker`	Refresh git provider OAuth tokens	`tokens`
`SyncReposWorker`	Sync repositories from git providers	`sync`
`WebhookProcessWorker`	Process incoming webhook events	`webhooks`

Architecture ​

Tech Stack ​

Multi-Tenant Structure ​

Data Flow ​

Key Design Decisions ​

Why Separate Analysis from Scoring? ​

Why Deterministic Triage? ​

Why Local Embeddings? ​

Why Oban for Job Processing? ​

Background Jobs ​