AI Slop Detection

As AI coding assistants become ubiquitous, a new quality problem emerges: AI slop — code that is technically functional but unnecessarily verbose, over-engineered, or stylistically foreign to the project.

ShipLens measures AI slop across seven dimensions, producing a Slop Index from 0.0 (clean) to 1.0 (pure slop).

Why Detect AI Slop?

AI-generated code often exhibits specific anti-patterns:

Verbosity — More code than needed, unnecessary intermediate variables
Comment pollution — Obvious comments restating what the code does
Over-engineering — Abstractions nobody asked for, factory-for-one patterns
Defensive bloat — Try-catches on safe paths, null checks on already-validated values
Style mismatch — Foreign conventions that don't match the project's idioms

These patterns aren't bugs — the code usually works. But they increase maintenance burden, obscure intent, and erode codebase consistency.

The Seven Dimensions

Each dimension is scored 0–5 by the LLM during commit analysis:

Dimension	Score 0	Score 5
Verbosity	Concise, no wasted lines	Could be written in half the code
Unnecessary Comments	No redundant comments	Comments restating every line
Over-Engineering	Right level of abstraction	Factory-for-one, premature abstractions
Defensive Bloat	Error handling where needed	Try-catches on safe paths everywhere
Style Mismatch	Matches project conventions	Foreign idioms, inconsistent naming
Redundancy	DRY, uses existing utilities	Copy-paste blocks, reimplements stdlib
Scope Creep	Stays within commit scope	Unrelated changes mixed in

Computing the Slop Index

slop_index = \frac{\sum_{d = 1}^{7} clamp (s_{d}, 0, 5)}{35}

Where 35 is the theoretical maximum (7 dimensions × 5 max per dimension).

Labels

Range	Label	Interpretation
≤ 0.20	Clean	Code shows no signs of AI slop
≤ 0.40	Minor	Some indicators present, likely acceptable
≤ 0.60	Notable	Multiple dimensions affected, worth reviewing
≤ 0.80	Significant	Strong AI slop signals across several dimensions
> 0.80	Pure Slop	Pervasive quality issues, likely unreviewed AI output

Three-Layer Assessment

Slop detection uses three complementary layers to maximize accuracy:

Layer 1: Heuristic Pre-Signals

Before the LLM even looks at the code, statistical signals hint at potential slop:

Heuristic	Threshold	Indicates
Lines per file	> 100 added lines/file	Possible verbosity
Add/remove ratio	> 5:1 additions to deletions	Possible bloat (all-new code, no cleanup)
New file ratio	> 70% of files are new	Possible over-engineering (too many new modules)

These signals are passed to the LLM as context ("pay attention to potential verbosity") but don't directly set scores.

Layer 2: LLM Assessment

During standard or deep analysis, the LLM evaluates all seven dimensions based on the actual code diff. It provides:

A 0–5 score per dimension
A text rationale explaining its assessment

Layer 3: Statistical Floors

After the LLM assessment, statistical safeguards ensure minimum scores when heuristics strongly disagree:

Heuristic Signal	Floor Applied
Very high lines per file	Verbosity ≥ floor value
Very high add/remove ratio	Over-engineering ≥ floor value
Very high new file ratio	Over-engineering ≥ floor value

Additionally, the LLM rationale text is scanned for keywords that should bump dimension scores to at least 2 if present.

Backfill Estimation

For commits analyzed before slop detection was added, or for shallow-depth commits, a conservative heuristic estimate is used:

Condition	Estimated Score
lines_added > 500, lines_removed < 20	verbosity: 3
lines_added > 200, lines_removed < 10	verbosity: 2
lines_added > 100, lines_removed = 0	verbosity: 1
new_modules > 5, lines_added > 300	over_engineering: 2
new_modules > 3, lines_added > 200	over_engineering: 1

Backfill-estimated slop indexes are capped at 0.5 to avoid false alarms — without LLM assessment, we stay conservative.

How Slop Data Is Used

Slop metrics appear in several places:

Surface	What's Shown
Contributor profile	Avg slop index, slop rate, per-dimension breakdown
Weekly digest	Team-level slop trends, high-slop commits
1:1 reports	Persistent slop patterns as coaching topics
Commit detail	Full dimension scores and rationale

TIP

Slop scores are informational only — they do not affect commit scores or the developer composite score. They exist to help managers identify code quality patterns that might warrant a conversation or a code review process change.

Example: Comparing Clean vs. Sloppy Commits

Clean Commit

elixir

# Adding a validation function
def validate_email(email) do
  Regex.match?(~r/^[^\s@]+@[^\s@]+\.[^\s@]+$/, email)
end

Slop scores: all dimensions 0. Index: 0.00 (clean).

Sloppy Commit

elixir

# This module handles email validation for the application
defmodule MyApp.Validators.EmailValidator do
  @moduledoc """
  EmailValidator module provides email validation functionality.
  It validates email addresses using regex patterns.
  """

  # Regular expression pattern for email validation
  @email_regex ~r/^[^\s@]+@[^\s@]+\.[^\s@]+$/

  @doc """
  Validates an email address.

  ## Parameters
    - email: The email address to validate (String)

  ## Returns
    - true if the email is valid
    - false if the email is not valid
  """
  def validate_email(email) do
    try do
      # Check if email is nil or empty
      if is_nil(email) || email == "" do
        false
      else
        # Perform regex validation
        result = Regex.match?(@email_regex, email)
        # Return the result
        result
      end
    rescue
      # Handle any unexpected errors
      _error -> false
    end
  end
end

Slop scores: verbosity: 4, unnecessary_comments: 5, over_engineering: 3, defensive_bloat: 4, style_mismatch: 1, redundancy: 2, scope_creep: 0. Index: 0.54 (notable).

AI Slop Detection ​

Why Detect AI Slop? ​

The Seven Dimensions ​

Computing the Slop Index ​

Labels ​

Three-Layer Assessment ​

Layer 1: Heuristic Pre-Signals ​

Layer 2: LLM Assessment ​

Layer 3: Statistical Floors ​

Backfill Estimation ​

How Slop Data Is Used ​

Example: Comparing Clean vs. Sloppy Commits ​

Clean Commit ​

Sloppy Commit ​