Skip to content

AI Slop Detection

As AI coding assistants become ubiquitous, a new quality problem emerges: AI slop — code that is technically functional but unnecessarily verbose, over-engineered, or stylistically foreign to the project.

ShipLens measures AI slop across seven dimensions, producing a Slop Index from 0.0 (clean) to 1.0 (pure slop).

Why Detect AI Slop?

AI-generated code often exhibits specific anti-patterns:

  • Verbosity — More code than needed, unnecessary intermediate variables
  • Comment pollution — Obvious comments restating what the code does
  • Over-engineering — Abstractions nobody asked for, factory-for-one patterns
  • Defensive bloat — Try-catches on safe paths, null checks on already-validated values
  • Style mismatch — Foreign conventions that don't match the project's idioms

These patterns aren't bugs — the code usually works. But they increase maintenance burden, obscure intent, and erode codebase consistency.

The Seven Dimensions

Each dimension is scored 0–5 by the LLM during commit analysis:

DimensionScore 0Score 5
VerbosityConcise, no wasted linesCould be written in half the code
Unnecessary CommentsNo redundant commentsComments restating every line
Over-EngineeringRight level of abstractionFactory-for-one, premature abstractions
Defensive BloatError handling where neededTry-catches on safe paths everywhere
Style MismatchMatches project conventionsForeign idioms, inconsistent naming
RedundancyDRY, uses existing utilitiesCopy-paste blocks, reimplements stdlib
Scope CreepStays within commit scopeUnrelated changes mixed in

Computing the Slop Index

slop_index=d=17clamp(sd,0,5)35

Where 35 is the theoretical maximum (7 dimensions × 5 max per dimension).

Labels

RangeLabelInterpretation
≤ 0.20CleanCode shows no signs of AI slop
≤ 0.40MinorSome indicators present, likely acceptable
≤ 0.60NotableMultiple dimensions affected, worth reviewing
≤ 0.80SignificantStrong AI slop signals across several dimensions
> 0.80Pure SlopPervasive quality issues, likely unreviewed AI output

Three-Layer Assessment

Slop detection uses three complementary layers to maximize accuracy:

Layer 1: Heuristic Pre-Signals

Before the LLM even looks at the code, statistical signals hint at potential slop:

HeuristicThresholdIndicates
Lines per file> 100 added lines/filePossible verbosity
Add/remove ratio> 5:1 additions to deletionsPossible bloat (all-new code, no cleanup)
New file ratio> 70% of files are newPossible over-engineering (too many new modules)

These signals are passed to the LLM as context ("pay attention to potential verbosity") but don't directly set scores.

Layer 2: LLM Assessment

During standard or deep analysis, the LLM evaluates all seven dimensions based on the actual code diff. It provides:

  • A 0–5 score per dimension
  • A text rationale explaining its assessment

Layer 3: Statistical Floors

After the LLM assessment, statistical safeguards ensure minimum scores when heuristics strongly disagree:

Heuristic SignalFloor Applied
Very high lines per fileVerbosity ≥ floor value
Very high add/remove ratioOver-engineering ≥ floor value
Very high new file ratioOver-engineering ≥ floor value

Additionally, the LLM rationale text is scanned for keywords that should bump dimension scores to at least 2 if present.

Backfill Estimation

For commits analyzed before slop detection was added, or for shallow-depth commits, a conservative heuristic estimate is used:

ConditionEstimated Score
lines_added > 500, lines_removed < 20verbosity: 3
lines_added > 200, lines_removed < 10verbosity: 2
lines_added > 100, lines_removed = 0verbosity: 1
new_modules > 5, lines_added > 300over_engineering: 2
new_modules > 3, lines_added > 200over_engineering: 1

Backfill-estimated slop indexes are capped at 0.5 to avoid false alarms — without LLM assessment, we stay conservative.

How Slop Data Is Used

Slop metrics appear in several places:

SurfaceWhat's Shown
Contributor profileAvg slop index, slop rate, per-dimension breakdown
Weekly digestTeam-level slop trends, high-slop commits
1:1 reportsPersistent slop patterns as coaching topics
Commit detailFull dimension scores and rationale

TIP

Slop scores are informational only — they do not affect commit scores or the developer composite score. They exist to help managers identify code quality patterns that might warrant a conversation or a code review process change.

Example: Comparing Clean vs. Sloppy Commits

Clean Commit

elixir
# Adding a validation function
def validate_email(email) do
  Regex.match?(~r/^[^\s@]+@[^\s@]+\.[^\s@]+$/, email)
end

Slop scores: all dimensions 0. Index: 0.00 (clean).

Sloppy Commit

elixir
# This module handles email validation for the application
defmodule MyApp.Validators.EmailValidator do
  @moduledoc """
  EmailValidator module provides email validation functionality.
  It validates email addresses using regex patterns.
  """

  # Regular expression pattern for email validation
  @email_regex ~r/^[^\s@]+@[^\s@]+\.[^\s@]+$/

  @doc """
  Validates an email address.

  ## Parameters
    - email: The email address to validate (String)

  ## Returns
    - true if the email is valid
    - false if the email is not valid
  """
  def validate_email(email) do
    try do
      # Check if email is nil or empty
      if is_nil(email) || email == "" do
        false
      else
        # Perform regex validation
        result = Regex.match?(@email_regex, email)
        # Return the result
        result
      end
    rescue
      # Handle any unexpected errors
      _error -> false
    end
  end
end

Slop scores: verbosity: 4, unnecessary_comments: 5, over_engineering: 3, defensive_bloat: 4, style_mismatch: 1, redundancy: 2, scope_creep: 0. Index: 0.54 (notable).

Built with intelligence, not surveillance.