Skip to content

Commit Scoring (V2)

The scoring engine transforms commit analysis reports into a single numerical score (0–10). It's designed to be transparent, configurable, and independent of the LLM analysis step.

Design Principles

  1. Multiplicative core — A commit must be both complex and impactful to score high. Simple but impactful changes (like a one-line security fix) get moderate scores, as do complex but low-impact changes (like a large refactor in a trivial area).

  2. Additive bonuses — Effort, quality signals, and risk signals add bonuses on top of the core score, each with caps to prevent runaway inflation.

  3. Full transparency — Every score stores its individual components, so you can always see exactly why a commit scored the way it did.

  4. Re-scorable — Change the weights, switch presets, or create custom configs — re-score your entire history instantly without re-analyzing.

The Formula

score=min((C×I)124×7normalized core+Eeffort+Qquality+Rrisk,10.0)

Where:

  • C = complexity (1–5 scale, from LLM analysis)
  • I = impact (1–5 scale, from LLM analysis)
  • E = effort bonus (logarithmic, based on lines changed)
  • Q = quality bonus (count of quality signals × weight, capped)
  • R = risk bonus (count of risk signals × weight, capped)

Normalized Core

The core score uses a multiplicative relationship between complexity and impact:

core=C×I(range: 1 to 25)normalized=core124×7.0(range: 0.0 to 7.0)

Why multiplicative? Addition would let a trivial-complexity/high-impact commit score the same as a high-complexity/trivial-impact commit. Multiplication ensures both dimensions must be present.

ComplexityImpactCoreNormalized
1110.00
2240.88
3392.33
44164.38
55257.00
1551.17
5151.17

Notice that a max-complexity/min-impact commit scores the same as a min-complexity/max-impact commit (1.17) — you need both to score high.

Effort Bonus

A logarithmic function of total lines changed, preventing large diffs from dominating the score:

E=min(log2(lines_added+lines_removed+1)10,We)

Where We is the effort weight (default: 0.5).

Total LinesRaw Log₂Effort (capped at 0.5)
100.350.35
500.570.50
1000.670.50
5000.900.50
10001.000.50

The logarithmic curve means the first 50 lines contribute nearly as much effort bonus as the next 950. This intentionally de-emphasizes raw volume.

Quality Bonus

Q=min(|quality_signals|×Wq,Cq)

Default: Wq=0.4 per signal, Cq=1.5 cap.

Signals PresentBonus
00.0
10.4
20.8
31.2
4+1.5 (capped)

Quality signals: has_tests, good_error_handling, clean_patterns, reduces_tech_debt, good_documentation.

Risk Bonus

R=min(|risk_signals|×Wr,Cr)

Default: Wr=0.25 per signal, Cr=0.5 cap.

Signals PresentBonus
00.0
10.25
2+0.50 (capped)

Risk signals: touches_auth, touches_payments, modifies_data_model, cross_module_change, production_hotfix.

Why does risk give a bonus, not a penalty?

Risk signals indicate that a commit touches sensitive, important areas. Doing so successfully is harder and more valuable. The bonus recognizes the additional care required, not the risk itself.

Score Examples

Example 1: Simple bug fix

A one-line fix to a typo in a non-critical utility function.

ComponentValueContribution
Complexity1
Impact1
Normalized core0.00
Lines changed2Effort: 0.16
Quality signals0Quality: 0.00
Risk signals0Risk: 0.00
Final score0.16

Example 2: Feature with tests

A medium-complexity feature adding a new API endpoint with test coverage.

ComponentValueContribution
Complexity3
Impact3
Normalized core2.33
Lines changed150Effort: 0.50
Quality signals2 (has_tests, clean_patterns)Quality: 0.80
Risk signals1 (modifies_data_model)Risk: 0.25
Final score3.88

Example 3: Critical security refactor

A deep, complex refactor of the authentication system with full test coverage and clean patterns.

ComponentValueContribution
Complexity5
Impact5
Normalized core7.00
Lines changed400Effort: 0.50
Quality signals4 (has_tests, good_error_handling, clean_patterns, reduces_tech_debt)Quality: 1.50
Risk signals2 (touches_auth, cross_module_change)Risk: 0.50
Final score9.50

Fallback Heuristics

When complexity or impact values are missing from the LLM analysis (e.g., for shallow commits), the scoring engine estimates them from available metadata:

Complexity estimation (capped at 4 — never assigns maximum without LLM confirmation):

ConditionBonus
Base1
Lines changed > 20+1
Files changed > 3+1
Introduces new pattern+1
Has migration+1

Impact estimation (capped at 4):

ConditionValue
Domain criticality: low1
Domain criticality: medium2
Domain criticality: high3
Domain criticality: critical4
Commit type is feat/fix/perf+1
Touches core system+1

Scoring Presets

Default

Balanced weights for general-purpose scoring.

effort_weight:      0.5
quality_per_signal: 0.4
quality_cap:        1.5
risk_per_signal:    0.25
risk_cap:           0.5

Quality-Focused

Rewards engineering best practices more heavily.

effort_weight:      0.5
quality_per_signal: 0.6    ← +50% per signal
quality_cap:        2.0    ← higher cap
risk_per_signal:    0.25
risk_cap:           0.5

Risk-Aware

Gives more credit for working in sensitive areas.

effort_weight:      0.5
quality_per_signal: 0.4
quality_cap:        1.5
risk_per_signal:    0.4    ← +60% per signal
risk_cap:           1.0    ← doubled cap

V1 Scoring (Legacy)

The original scoring engine used a weighted formula based on commit type and indicator bonuses. It is still available for backward compatibility but V2 is recommended for all new deployments.

V1 type weights:

TypeWeight
feat1.0
fix0.9
refactor0.85
perf0.9
test0.6
docs0.3
chore0.3
style0.1

V1 indicator bonuses:

IndicatorBonus
touches_core_system3.0
introduces_new_pattern2.0
has_migration1.5
dependencies_changed1.0
new_modules_created1.0 per module
tests_added0.5 per test

V1 domain multipliers:

Domain CriticalityMultiplier
critical1.5×
high1.2×
medium1.0×
low0.7×
trivial0.3×

Built with intelligence, not surveillance.