Skip to content

Multi-Branch Discovery

Most engineering teams don't work exclusively on main. Feature branches, release branches, hotfix branches — real work happens everywhere. Analyzing only the default branch means missing commits that haven't been merged yet, or worse, attributing merged work solely to the person who clicked the merge button.

The Problem

A typical workflow:

  1. Engineer creates feature/auth-refactor, makes 15 commits over 3 days
  2. Engineer opens a PR, gets reviews, makes 5 more commits
  3. PR is squash-merged into main as a single commit

If ShipLens only scans main, it sees one commit. The 20 commits of actual work — including the iterative refinement from code review — are invisible.

Worse: if the PR hasn't merged yet, the engineer's work doesn't exist in the system at all.

The Solution: Mixed Model Scanning

ShipLens uses a mixed model that combines default branch analysis with branch discovery:

Branch Discovery

During sync, ShipLens fetches all remote branches and filters to those with activity within the configured window (default: 30 days). This avoids scanning stale branches that would add cost without value.

ParameterDefaultDescription
branch_activity_window30 daysOnly scan branches with commits newer than this
branch_exclude_patterns["dependabot/*", "renovate/*"]Glob patterns for branches to skip

What Gets Scanned

Branch TypeScanned?Notes
Default (main, master)AlwaysFull history within the sync window
Feature branchesYesIf active within the activity window
Release branchesYesIf active within the activity window
Bot branchesNoExcluded by default patterns

Rebase-Resilient Deduplication

The hardest part of multi-branch scanning is deduplication. The same logical commit can appear with different SHAs after a rebase or force push. ShipLens handles this with a multi-signal matching strategy:

Deduplication Signals

SignalWeightHow It Works
SHA matchExactSame SHA = same commit. No ambiguity.
Patch IDStrongGit's patch-id computes a hash of the diff content, ignoring metadata. Two commits with different SHAs but identical diffs get the same patch ID.
Author + timestamp + messageModerateCatches commits that were cherry-picked with minor diff changes.

The deduplication algorithm:

  1. Exact SHA match — If we've already seen this SHA, skip it.
  2. Patch ID match — Compute the patch ID and check against known patch IDs. If matched, record the association but don't re-analyze.
  3. Fuzzy match — If the author, timestamp (within 60 seconds), and first line of the commit message match an existing commit, flag it as a likely duplicate for review.

Fuzzy matches are flagged but not automatically deduplicated — they're surfaced in the sync log for verification.

Branch Metadata

Each commit stores its branch context:

FieldDescription
branchThe branch where this commit was first discovered
is_mergeWhether this is a merge commit
merged_intoThe target branch if this commit was part of a merged PR

This metadata enables branch-aware reporting without losing the simplicity of a flat commit timeline.

Commit Attribution

A commit is attributed to its author, regardless of which branch it lives on or who merged it. Specifically:

  • Author (from git log --format=%an) is used for attribution, not committer
  • Merge commits are attributed to the merger but flagged as merges and typically receive shallow analysis
  • Squash merge commits on the default branch are attributed to the merger; the original branch commits (if scanned) are attributed to their respective authors

This means a contributor's work is visible even before their PR merges — their feature branch commits appear in reports as soon as the branch is scanned.

Merge Commit Handling

Merge commits receive special treatment:

Merge TypeHandling
Regular mergeAnalyzed at shallow depth (no LLM). Files changed = 0 for empty merges.
Squash mergeAnalyzed normally — it contains the actual diff. Linked to the originating PR.
Fast-forwardNot a merge commit at all — the commits simply appear on the target branch.

Regular merge commits are kept in the system for timeline completeness but contribute minimally to scoring. The real analytical value is in the individual commits on the source branch.

TIP

Multi-branch scanning increases the number of commits analyzed, which increases LLM costs. The triage system mitigates this — many branch commits (WIP commits, fixups, merge commits) will be triaged to shallow depth at zero LLM cost.

Built with intelligence, not surveillance.