Skip to content

Architecture

ShipLens is built on Elixir/OTP with Phoenix LiveView, designed for concurrency, fault tolerance, and real-time updates.

Tech Stack

LayerTechnologyWhy
LanguageElixir (OTP)Concurrency, fault tolerance, Broadway pipelines
DatabasePostgreSQL + pgvectorRelational data + vector similarity for RAG
LLMClaude Haiku & SonnetHaiku for cost-efficient analysis, Sonnet for deep/generative
EmbeddingsBumblebee (all-MiniLM-L6-v2)Local inference, 384-dimensional vectors, no external API
FrontendPhoenix LiveViewReal-time UI without a separate SPA
Job QueueObanReliable background processing with PostgreSQL-backed queues
Dev/DeployNixReproducible environments via flake.nix and direnv

Multi-Tenant Structure

Company is the tenant root. Everything — teams, contributors, projects, reports — belongs to a company. Companies have a plan field (free, starter, business, enterprise) for future billing.

Teams are sub-groups within a company. A contributor belongs to one team. Weekly digests can be generated company-wide or per-team.

Contributors have multiple identities (git emails) that map to a single person. This handles the common case of developers using different email addresses across repositories.

Data Flow

Key Design Decisions

Why Separate Analysis from Scoring?

LLM analysis is expensive and slow. Scoring is cheap and instant. By separating them:

  • Analysis runs once per commit (cost: ~$0.001–0.01 per commit)
  • Scoring can be re-run unlimited times with different configurations
  • CTOs can experiment with weights without burning API credits
  • Historical re-scoring takes seconds, not hours

Why Deterministic Triage?

Before calling any LLM, every commit passes through rule-based triage. This saves significant cost:

  • Empty merge commits → shallow (no LLM)
  • CI/CD-only changes → shallow (no LLM)
  • Docs-only changes → shallow (no LLM)
  • Security-related → deep (Sonnet, more expensive but warranted)
  • Everything else → standard (Haiku, cost-efficient)

Roughly 20-30% of commits in a typical repository are shallow, saving that percentage of LLM costs entirely.

Why Local Embeddings?

We use Bumblebee with the all-MiniLM-L6-v2 model (384 dimensions) for embeddings instead of an external API:

  • No additional API cost — embeddings run locally
  • No data leaves your infrastructure — project context stays local
  • Low latency — no network round-trip
  • Good enough quality — 384-dim vectors provide strong semantic similarity for code context retrieval

Why Oban for Job Processing?

Oban uses PostgreSQL as its job queue backend, which means:

  • No additional infrastructure (Redis, RabbitMQ, etc.)
  • Transactional job creation — jobs are created in the same transaction as related data
  • Built-in retry, dead-letter queues, and scheduling
  • Job introspection through SQL queries

Background Jobs

JobPurposeQueue
CommitWorkerProcess individual commit (triage → analysis)commits
DigestGenerateJobGenerate weekly digest content via Sonnetdigests
SilentContributorJobDetect inactive contributorsalerts
CommitPatternAlertJobDetect high fix ratio, low test disciplinealerts
RefreshTokenWorkerRefresh git provider OAuth tokenstokens
SyncReposWorkerSync repositories from git providerssync
WebhookProcessWorkerProcess incoming webhook eventswebhooks

Built with intelligence, not surveillance.