Multi-Branch Discovery

Most engineering teams don't work exclusively on main. Feature branches, release branches, hotfix branches — real work happens everywhere. Analyzing only the default branch means missing commits that haven't been merged yet, or worse, attributing merged work solely to the person who clicked the merge button.

The Problem

A typical workflow:

Engineer creates feature/auth-refactor, makes 15 commits over 3 days
Engineer opens a PR, gets reviews, makes 5 more commits
PR is squash-merged into main as a single commit

If ShipLens only scans main, it sees one commit. The 20 commits of actual work — including the iterative refinement from code review — are invisible.

Worse: if the PR hasn't merged yet, the engineer's work doesn't exist in the system at all.

The Solution: Mixed Model Scanning

ShipLens uses a mixed model that combines default branch analysis with branch discovery:

Branch Discovery

During sync, ShipLens fetches all remote branches and filters to those with activity within the configured window (default: 30 days). This avoids scanning stale branches that would add cost without value.

Parameter	Default	Description
`branch_activity_window`	30 days	Only scan branches with commits newer than this
`branch_exclude_patterns`	`["dependabot/", "renovate/"]`	Glob patterns for branches to skip

What Gets Scanned

Branch Type	Scanned?	Notes
Default (`main`, `master`)	Always	Full history within the sync window
Feature branches	Yes	If active within the activity window
Release branches	Yes	If active within the activity window
Bot branches	No	Excluded by default patterns

Rebase-Resilient Deduplication

The hardest part of multi-branch scanning is deduplication. The same logical commit can appear with different SHAs after a rebase or force push. ShipLens handles this with a multi-signal matching strategy:

Deduplication Signals

Signal	Weight	How It Works
SHA match	Exact	Same SHA = same commit. No ambiguity.
Patch ID	Strong	Git's `patch-id` computes a hash of the diff content, ignoring metadata. Two commits with different SHAs but identical diffs get the same patch ID.
Author + timestamp + message	Moderate	Catches commits that were cherry-picked with minor diff changes.

The deduplication algorithm:

Exact SHA match — If we've already seen this SHA, skip it.
Patch ID match — Compute the patch ID and check against known patch IDs. If matched, record the association but don't re-analyze.
Fuzzy match — If the author, timestamp (within 60 seconds), and first line of the commit message match an existing commit, flag it as a likely duplicate for review.

Fuzzy matches are flagged but not automatically deduplicated — they're surfaced in the sync log for verification.

Branch Metadata

Each commit stores its branch context:

Field	Description
`branch`	The branch where this commit was first discovered
`is_merge`	Whether this is a merge commit
`merged_into`	The target branch if this commit was part of a merged PR

This metadata enables branch-aware reporting without losing the simplicity of a flat commit timeline.

Commit Attribution

A commit is attributed to its author, regardless of which branch it lives on or who merged it. Specifically:

Author (from git log --format=%an) is used for attribution, not committer
Merge commits are attributed to the merger but flagged as merges and typically receive shallow analysis
Squash merge commits on the default branch are attributed to the merger; the original branch commits (if scanned) are attributed to their respective authors

This means a contributor's work is visible even before their PR merges — their feature branch commits appear in reports as soon as the branch is scanned.

Merge Commit Handling

Merge commits receive special treatment:

Merge Type	Handling
Regular merge	Analyzed at `shallow` depth (no LLM). Files changed = 0 for empty merges.
Squash merge	Analyzed normally — it contains the actual diff. Linked to the originating PR.
Fast-forward	Not a merge commit at all — the commits simply appear on the target branch.

Regular merge commits are kept in the system for timeline completeness but contribute minimally to scoring. The real analytical value is in the individual commits on the source branch.

TIP

Multi-branch scanning increases the number of commits analyzed, which increases LLM costs. The triage system mitigates this — many branch commits (WIP commits, fixups, merge commits) will be triaged to shallow depth at zero LLM cost.

Multi-Branch Discovery ​

The Problem ​

The Solution: Mixed Model Scanning ​

Branch Discovery ​

What Gets Scanned ​

Rebase-Resilient Deduplication ​

Deduplication Signals ​

Branch Metadata ​

Commit Attribution ​

Merge Commit Handling ​