Architecture
ShipLens is built on Elixir/OTP with Phoenix LiveView, designed for concurrency, fault tolerance, and real-time updates.
Tech Stack
| Layer | Technology | Why |
|---|---|---|
| Language | Elixir (OTP) | Concurrency, fault tolerance, Broadway pipelines |
| Database | PostgreSQL + pgvector | Relational data + vector similarity for RAG |
| LLM | Claude Haiku & Sonnet | Haiku for cost-efficient analysis, Sonnet for deep/generative |
| Embeddings | Bumblebee (all-MiniLM-L6-v2) | Local inference, 384-dimensional vectors, no external API |
| Frontend | Phoenix LiveView | Real-time UI without a separate SPA |
| Job Queue | Oban | Reliable background processing with PostgreSQL-backed queues |
| Dev/Deploy | Nix | Reproducible environments via flake.nix and direnv |
Multi-Tenant Structure
Company is the tenant root. Everything — teams, contributors, projects, reports — belongs to a company. Companies have a plan field (free, starter, business, enterprise) for future billing.
Teams are sub-groups within a company. A contributor belongs to one team. Weekly digests can be generated company-wide or per-team.
Contributors have multiple identities (git emails) that map to a single person. This handles the common case of developers using different email addresses across repositories.
Data Flow
Key Design Decisions
Why Separate Analysis from Scoring?
LLM analysis is expensive and slow. Scoring is cheap and instant. By separating them:
- Analysis runs once per commit (cost: ~$0.001–0.01 per commit)
- Scoring can be re-run unlimited times with different configurations
- CTOs can experiment with weights without burning API credits
- Historical re-scoring takes seconds, not hours
Why Deterministic Triage?
Before calling any LLM, every commit passes through rule-based triage. This saves significant cost:
- Empty merge commits → shallow (no LLM)
- CI/CD-only changes → shallow (no LLM)
- Docs-only changes → shallow (no LLM)
- Security-related → deep (Sonnet, more expensive but warranted)
- Everything else → standard (Haiku, cost-efficient)
Roughly 20-30% of commits in a typical repository are shallow, saving that percentage of LLM costs entirely.
Why Local Embeddings?
We use Bumblebee with the all-MiniLM-L6-v2 model (384 dimensions) for embeddings instead of an external API:
- No additional API cost — embeddings run locally
- No data leaves your infrastructure — project context stays local
- Low latency — no network round-trip
- Good enough quality — 384-dim vectors provide strong semantic similarity for code context retrieval
Why Oban for Job Processing?
Oban uses PostgreSQL as its job queue backend, which means:
- No additional infrastructure (Redis, RabbitMQ, etc.)
- Transactional job creation — jobs are created in the same transaction as related data
- Built-in retry, dead-letter queues, and scheduling
- Job introspection through SQL queries
Background Jobs
| Job | Purpose | Queue |
|---|---|---|
CommitWorker | Process individual commit (triage → analysis) | commits |
DigestGenerateJob | Generate weekly digest content via Sonnet | digests |
SilentContributorJob | Detect inactive contributors | alerts |
CommitPatternAlertJob | Detect high fix ratio, low test discipline | alerts |
RefreshTokenWorker | Refresh git provider OAuth tokens | tokens |
SyncReposWorker | Sync repositories from git providers | sync |
WebhookProcessWorker | Process incoming webhook events | webhooks |
