AI Slop Detection
As AI coding assistants become ubiquitous, a new quality problem emerges: AI slop — code that is technically functional but unnecessarily verbose, over-engineered, or stylistically foreign to the project.
ShipLens measures AI slop across seven dimensions, producing a Slop Index from 0.0 (clean) to 1.0 (pure slop).
Why Detect AI Slop?
AI-generated code often exhibits specific anti-patterns:
- Verbosity — More code than needed, unnecessary intermediate variables
- Comment pollution — Obvious comments restating what the code does
- Over-engineering — Abstractions nobody asked for, factory-for-one patterns
- Defensive bloat — Try-catches on safe paths, null checks on already-validated values
- Style mismatch — Foreign conventions that don't match the project's idioms
These patterns aren't bugs — the code usually works. But they increase maintenance burden, obscure intent, and erode codebase consistency.
The Seven Dimensions
Each dimension is scored 0–5 by the LLM during commit analysis:
| Dimension | Score 0 | Score 5 |
|---|---|---|
| Verbosity | Concise, no wasted lines | Could be written in half the code |
| Unnecessary Comments | No redundant comments | Comments restating every line |
| Over-Engineering | Right level of abstraction | Factory-for-one, premature abstractions |
| Defensive Bloat | Error handling where needed | Try-catches on safe paths everywhere |
| Style Mismatch | Matches project conventions | Foreign idioms, inconsistent naming |
| Redundancy | DRY, uses existing utilities | Copy-paste blocks, reimplements stdlib |
| Scope Creep | Stays within commit scope | Unrelated changes mixed in |
Computing the Slop Index
Where 35 is the theoretical maximum (7 dimensions × 5 max per dimension).
Labels
| Range | Label | Interpretation |
|---|---|---|
| ≤ 0.20 | Clean | Code shows no signs of AI slop |
| ≤ 0.40 | Minor | Some indicators present, likely acceptable |
| ≤ 0.60 | Notable | Multiple dimensions affected, worth reviewing |
| ≤ 0.80 | Significant | Strong AI slop signals across several dimensions |
| > 0.80 | Pure Slop | Pervasive quality issues, likely unreviewed AI output |
Three-Layer Assessment
Slop detection uses three complementary layers to maximize accuracy:
Layer 1: Heuristic Pre-Signals
Before the LLM even looks at the code, statistical signals hint at potential slop:
| Heuristic | Threshold | Indicates |
|---|---|---|
| Lines per file | > 100 added lines/file | Possible verbosity |
| Add/remove ratio | > 5:1 additions to deletions | Possible bloat (all-new code, no cleanup) |
| New file ratio | > 70% of files are new | Possible over-engineering (too many new modules) |
These signals are passed to the LLM as context ("pay attention to potential verbosity") but don't directly set scores.
Layer 2: LLM Assessment
During standard or deep analysis, the LLM evaluates all seven dimensions based on the actual code diff. It provides:
- A 0–5 score per dimension
- A text rationale explaining its assessment
Layer 3: Statistical Floors
After the LLM assessment, statistical safeguards ensure minimum scores when heuristics strongly disagree:
| Heuristic Signal | Floor Applied |
|---|---|
| Very high lines per file | Verbosity ≥ floor value |
| Very high add/remove ratio | Over-engineering ≥ floor value |
| Very high new file ratio | Over-engineering ≥ floor value |
Additionally, the LLM rationale text is scanned for keywords that should bump dimension scores to at least 2 if present.
Backfill Estimation
For commits analyzed before slop detection was added, or for shallow-depth commits, a conservative heuristic estimate is used:
| Condition | Estimated Score |
|---|---|
| lines_added > 500, lines_removed < 20 | verbosity: 3 |
| lines_added > 200, lines_removed < 10 | verbosity: 2 |
| lines_added > 100, lines_removed = 0 | verbosity: 1 |
| new_modules > 5, lines_added > 300 | over_engineering: 2 |
| new_modules > 3, lines_added > 200 | over_engineering: 1 |
Backfill-estimated slop indexes are capped at 0.5 to avoid false alarms — without LLM assessment, we stay conservative.
How Slop Data Is Used
Slop metrics appear in several places:
| Surface | What's Shown |
|---|---|
| Contributor profile | Avg slop index, slop rate, per-dimension breakdown |
| Weekly digest | Team-level slop trends, high-slop commits |
| 1:1 reports | Persistent slop patterns as coaching topics |
| Commit detail | Full dimension scores and rationale |
TIP
Slop scores are informational only — they do not affect commit scores or the developer composite score. They exist to help managers identify code quality patterns that might warrant a conversation or a code review process change.
Example: Comparing Clean vs. Sloppy Commits
Clean Commit
# Adding a validation function
def validate_email(email) do
Regex.match?(~r/^[^\s@]+@[^\s@]+\.[^\s@]+$/, email)
endSlop scores: all dimensions 0. Index: 0.00 (clean).
Sloppy Commit
# This module handles email validation for the application
defmodule MyApp.Validators.EmailValidator do
@moduledoc """
EmailValidator module provides email validation functionality.
It validates email addresses using regex patterns.
"""
# Regular expression pattern for email validation
@email_regex ~r/^[^\s@]+@[^\s@]+\.[^\s@]+$/
@doc """
Validates an email address.
## Parameters
- email: The email address to validate (String)
## Returns
- true if the email is valid
- false if the email is not valid
"""
def validate_email(email) do
try do
# Check if email is nil or empty
if is_nil(email) || email == "" do
false
else
# Perform regex validation
result = Regex.match?(@email_regex, email)
# Return the result
result
end
rescue
# Handle any unexpected errors
_error -> false
end
end
endSlop scores: verbosity: 4, unnecessary_comments: 5, over_engineering: 3, defensive_bloat: 4, style_mismatch: 1, redundancy: 2, scope_creep: 0. Index: 0.54 (notable).
