Gaming Detection
ShipLens includes five pattern detectors that identify potentially artificial commit patterns. These detectors operate on a strict principle: flags only, never penalties.
Philosophy
Any scoring system creates incentives. Some engineers might — consciously or not — adapt their commit patterns to inflate scores. Gaming detection exists to surface these patterns for human review, not to automate punishment.
Important
Gaming flags are informational. They never automatically reduce scores, block commits, or trigger negative consequences. They surface patterns that managers should investigate in a 1:1 conversation.
The Five Detectors
1. Commit Splitting
What it detects: Multiple commits to overlapping files in a short time window, suggesting a single logical change was artificially split into multiple commits.
| Parameter | Value |
|---|---|
| Time window | 30 minutes |
| Minimum commits | 3 |
How it works:
- Group commits by contributor within a 30-minute sliding window
- For each group of 3+ commits, check for file overlap
- If multiple commits touch the same files within the window, flag all of them
Why this matters: Splitting a single feature into many small commits can inflate volume-based metrics. However, some developers legitimately make rapid-fire commits as part of their workflow — context matters.
2. Add-Remove Cycles
What it detects: Paired commits where one adds lines and a follow-up removes a significant portion from the same files.
| Parameter | Value |
|---|---|
| Minimum lines added | 20 |
| Removal threshold | 50% of lines added |
How it works:
- For each pair of commits from the same contributor:
- Commit A adds ≥ 20 lines to a set of files
- Commit B removes ≥ 50% of those added lines from the same files
- Flag both commits
Pattern example:
Commit A: Add 100 lines to auth.ex ← flagged
Commit B: Remove 60 lines from auth.ex ← flaggedWhy this matters: This pattern can indicate code being added just to be removed, inflating both volume and churn metrics. It can also be legitimate iteration — the flag prompts investigation.
3. False Refactors
What it detects: Commits labeled as refactors or style changes that are actually just formatting or whitespace changes, claiming credit for significant work.
| Parameter | Value |
|---|---|
| Minimum lines changed | 50 |
| Trigger keywords | format, whitespace, indent, rename, style |
How it works:
- Check if commit type is
refactororstyle - Check if lines changed > 50
- Check if commit message or summary contains formatting keywords
- If all three conditions are met, flag the commit
Why this matters: A "refactor" that's actually just running a code formatter shouldn't be scored the same as a genuine architectural refactor. The flag distinguishes between meaningful and cosmetic changes.
4. Abnormal Frequency
What it detects: Days where a contributor's commit count is dramatically higher than their personal average.
| Parameter | Value |
|---|---|
| Multiplier threshold | 3× daily average |
| Minimum history | 2 days |
How it works:
- Calculate the contributor's average daily commit count across the analysis period
- For each day where
daily_count ≥ average × 3, flag all commits on that day
Example:
Average: 4 commits/day
Monday: 3 commits ← normal
Tuesday: 15 commits ← flagged (15 ≥ 4 × 3 = 12)Why this matters: Occasional high-output days are normal (release day, hackathon). Persistent abnormal frequency might indicate artificial inflation. The 3× threshold is generous to avoid false positives.
5. Trivial Splits
What it detects: Many tiny commits on the same day touching the same domain, suggesting a single small change was split into trivial pieces.
| Parameter | Value |
|---|---|
| Maximum lines per commit | 5 |
| Minimum commits to flag | 5 |
How it works:
- Group commits by contributor, day, and domain
- For each group: count commits where
total_lines ≤ 5 - If there are ≥ 5 such commits in the same group, flag all of them
Example:
09:00 - fix typo in user.ex (2 lines) ← flagged
09:05 - fix typo in user.ex (1 line) ← flagged
09:10 - rename var in user.ex (3 lines) ← flagged
09:15 - fix spacing in user.ex (1 line) ← flagged
09:20 - fix comment in user.ex (2 lines) ← flaggedWhy this matters: Each of these could have been a single commit. Splitting them artificially inflates commit count without adding meaningful value.
How Flags Are Stored
Gaming flags are stored as a list of strings on the CommitReport:
gaming_flags: ["commit_splitting", "add_remove_cycle", "false_refactor",
"abnormal_frequency", "trivial_splits"]A single commit can have multiple flags. Flags are surfaced in:
- Contributor profiles — Summary of flags by type
- Weekly digests — Risk signals section
- 1:1 reports — Topics to explore (when patterns are persistent)
- Alerts page — Aggregated view across all contributors
False Positive Considerations
Every detector can produce false positives in legitimate scenarios:
| Detector | Legitimate Scenario |
|---|---|
| Commit splitting | TDD red-green-refactor cycle |
| Add-remove cycles | Iterative prototyping with rollbacks |
| False refactors | Genuine formatting migration (e.g., adopting a new linter) |
| Abnormal frequency | Release day, hackathon, catching up after vacation |
| Trivial splits | Git bisect-friendly commit strategy |
This is exactly why flags never auto-penalize. They're conversation starters, not verdicts.
