Multi-Branch Discovery
Most engineering teams don't work exclusively on main. Feature branches, release branches, hotfix branches — real work happens everywhere. Analyzing only the default branch means missing commits that haven't been merged yet, or worse, attributing merged work solely to the person who clicked the merge button.
The Problem
A typical workflow:
- Engineer creates
feature/auth-refactor, makes 15 commits over 3 days - Engineer opens a PR, gets reviews, makes 5 more commits
- PR is squash-merged into
mainas a single commit
If ShipLens only scans main, it sees one commit. The 20 commits of actual work — including the iterative refinement from code review — are invisible.
Worse: if the PR hasn't merged yet, the engineer's work doesn't exist in the system at all.
The Solution: Mixed Model Scanning
ShipLens uses a mixed model that combines default branch analysis with branch discovery:
Branch Discovery
During sync, ShipLens fetches all remote branches and filters to those with activity within the configured window (default: 30 days). This avoids scanning stale branches that would add cost without value.
| Parameter | Default | Description |
|---|---|---|
branch_activity_window | 30 days | Only scan branches with commits newer than this |
branch_exclude_patterns | ["dependabot/*", "renovate/*"] | Glob patterns for branches to skip |
What Gets Scanned
| Branch Type | Scanned? | Notes |
|---|---|---|
Default (main, master) | Always | Full history within the sync window |
| Feature branches | Yes | If active within the activity window |
| Release branches | Yes | If active within the activity window |
| Bot branches | No | Excluded by default patterns |
Rebase-Resilient Deduplication
The hardest part of multi-branch scanning is deduplication. The same logical commit can appear with different SHAs after a rebase or force push. ShipLens handles this with a multi-signal matching strategy:
Deduplication Signals
| Signal | Weight | How It Works |
|---|---|---|
| SHA match | Exact | Same SHA = same commit. No ambiguity. |
| Patch ID | Strong | Git's patch-id computes a hash of the diff content, ignoring metadata. Two commits with different SHAs but identical diffs get the same patch ID. |
| Author + timestamp + message | Moderate | Catches commits that were cherry-picked with minor diff changes. |
The deduplication algorithm:
- Exact SHA match — If we've already seen this SHA, skip it.
- Patch ID match — Compute the patch ID and check against known patch IDs. If matched, record the association but don't re-analyze.
- Fuzzy match — If the author, timestamp (within 60 seconds), and first line of the commit message match an existing commit, flag it as a likely duplicate for review.
Fuzzy matches are flagged but not automatically deduplicated — they're surfaced in the sync log for verification.
Branch Metadata
Each commit stores its branch context:
| Field | Description |
|---|---|
branch | The branch where this commit was first discovered |
is_merge | Whether this is a merge commit |
merged_into | The target branch if this commit was part of a merged PR |
This metadata enables branch-aware reporting without losing the simplicity of a flat commit timeline.
Commit Attribution
A commit is attributed to its author, regardless of which branch it lives on or who merged it. Specifically:
- Author (from
git log --format=%an) is used for attribution, not committer - Merge commits are attributed to the merger but flagged as merges and typically receive shallow analysis
- Squash merge commits on the default branch are attributed to the merger; the original branch commits (if scanned) are attributed to their respective authors
This means a contributor's work is visible even before their PR merges — their feature branch commits appear in reports as soon as the branch is scanned.
Merge Commit Handling
Merge commits receive special treatment:
| Merge Type | Handling |
|---|---|
| Regular merge | Analyzed at shallow depth (no LLM). Files changed = 0 for empty merges. |
| Squash merge | Analyzed normally — it contains the actual diff. Linked to the originating PR. |
| Fast-forward | Not a merge commit at all — the commits simply appear on the target branch. |
Regular merge commits are kept in the system for timeline completeness but contribute minimally to scoring. The real analytical value is in the individual commits on the source branch.
TIP
Multi-branch scanning increases the number of commits analyzed, which increases LLM costs. The triage system mitigates this — many branch commits (WIP commits, fixups, merge commits) will be triaged to shallow depth at zero LLM cost.
