Commit Scoring (V2)

The scoring engine transforms commit analysis reports into a single numerical score (0–10). It's designed to be transparent, configurable, and independent of the LLM analysis step.

Design Principles

Multiplicative core — A commit must be both complex and impactful to score high. Simple but impactful changes (like a one-line security fix) get moderate scores, as do complex but low-impact changes (like a large refactor in a trivial area).
Additive bonuses — Effort, quality signals, and risk signals add bonuses on top of the core score, each with caps to prevent runaway inflation.
Full transparency — Every score stores its individual components, so you can always see exactly why a commit scored the way it did.
Re-scorable — Change the weights, switch presets, or create custom configs — re-score your entire history instantly without re-analyzing.

The Formula

score = min (\underset{normalized core}{\underset{⏟}{\frac{(C \times I) - 1}{24} \times 7}} + \underset{effort}{\underset{⏟}{E}} + \underset{quality}{\underset{⏟}{Q}} + \underset{risk}{\underset{⏟}{R}}, 10.0)

Where:

$C$ = complexity (1–5 scale, from LLM analysis)
$I$ = impact (1–5 scale, from LLM analysis)
$E$ = effort bonus (logarithmic, based on lines changed)
$Q$ = quality bonus (count of quality signals × weight, capped)
$R$ = risk bonus (count of risk signals × weight, capped)

Normalized Core

The core score uses a multiplicative relationship between complexity and impact:

core = C \times I (range: 1 to 25)

normalized = \frac{core - 1}{24} \times 7.0 (range: 0.0 to 7.0)

Why multiplicative? Addition would let a trivial-complexity/high-impact commit score the same as a high-complexity/trivial-impact commit. Multiplication ensures both dimensions must be present.

Complexity	Impact	Core	Normalized
1	1	1	0.00
2	2	4	0.88
3	3	9	2.33
4	4	16	4.38
5	5	25	7.00
1	5	5	1.17
5	1	5	1.17

Notice that a max-complexity/min-impact commit scores the same as a min-complexity/max-impact commit (1.17) — you need both to score high.

Effort Bonus

A logarithmic function of total lines changed, preventing large diffs from dominating the score:

E = min (\frac{\log_{2} (lines_added + lines_removed + 1)}{10}, W_{e})

Where $W_{e}$ is the effort weight (default: 0.5).

Total Lines	Raw Log₂	Effort (capped at 0.5)
10	0.35	0.35
50	0.57	0.50
100	0.67	0.50
500	0.90	0.50
1000	1.00	0.50

The logarithmic curve means the first 50 lines contribute nearly as much effort bonus as the next 950. This intentionally de-emphasizes raw volume.

Quality Bonus

Q = min (| quality_signals | \times W_{q}, C_{q})

Default: $W_{q} = 0.4$ per signal, $C_{q} = 1.5$ cap.

Signals Present	Bonus
0	0.0
1	0.4
2	0.8
3	1.2
4+	1.5 (capped)

Quality signals: has_tests, good_error_handling, clean_patterns, reduces_tech_debt, good_documentation.

Risk Bonus

R = min (| risk_signals | \times W_{r}, C_{r})

Default: $W_{r} = 0.25$ per signal, $C_{r} = 0.5$ cap.

Signals Present	Bonus
0	0.0
1	0.25
2+	0.50 (capped)

Risk signals: touches_auth, touches_payments, modifies_data_model, cross_module_change, production_hotfix.

Why does risk give a bonus, not a penalty?

Risk signals indicate that a commit touches sensitive, important areas. Doing so successfully is harder and more valuable. The bonus recognizes the additional care required, not the risk itself.

Score Examples

Example 1: Simple bug fix

A one-line fix to a typo in a non-critical utility function.

Component	Value	Contribution
Complexity	1	—
Impact	1	—
Normalized core		0.00
Lines changed	2	Effort: 0.16
Quality signals	0	Quality: 0.00
Risk signals	0	Risk: 0.00
Final score		0.16

Example 2: Feature with tests

A medium-complexity feature adding a new API endpoint with test coverage.

Component	Value	Contribution
Complexity	3	—
Impact	3	—
Normalized core		2.33
Lines changed	150	Effort: 0.50
Quality signals	2 (has_tests, clean_patterns)	Quality: 0.80
Risk signals	1 (modifies_data_model)	Risk: 0.25
Final score		3.88

Example 3: Critical security refactor

A deep, complex refactor of the authentication system with full test coverage and clean patterns.

Component	Value	Contribution
Complexity	5	—
Impact	5	—
Normalized core		7.00
Lines changed	400	Effort: 0.50
Quality signals	4 (has_tests, good_error_handling, clean_patterns, reduces_tech_debt)	Quality: 1.50
Risk signals	2 (touches_auth, cross_module_change)	Risk: 0.50
Final score		9.50

Fallback Heuristics

When complexity or impact values are missing from the LLM analysis (e.g., for shallow commits), the scoring engine estimates them from available metadata:

Complexity estimation (capped at 4 — never assigns maximum without LLM confirmation):

Condition	Bonus
Base	1
Lines changed > 20	+1
Files changed > 3	+1
Introduces new pattern	+1
Has migration	+1

Impact estimation (capped at 4):

Condition	Value
Domain criticality: low	1
Domain criticality: medium	2
Domain criticality: high	3
Domain criticality: critical	4
Commit type is feat/fix/perf	+1
Touches core system	+1

Scoring Presets

Default

Balanced weights for general-purpose scoring.

effort_weight:      0.5
quality_per_signal: 0.4
quality_cap:        1.5
risk_per_signal:    0.25
risk_cap:           0.5

Quality-Focused

Rewards engineering best practices more heavily.

effort_weight:      0.5
quality_per_signal: 0.6    ← +50% per signal
quality_cap:        2.0    ← higher cap
risk_per_signal:    0.25
risk_cap:           0.5

Risk-Aware

Gives more credit for working in sensitive areas.

effort_weight:      0.5
quality_per_signal: 0.4
quality_cap:        1.5
risk_per_signal:    0.4    ← +60% per signal
risk_cap:           1.0    ← doubled cap

V1 Scoring (Legacy)

The original scoring engine used a weighted formula based on commit type and indicator bonuses. It is still available for backward compatibility but V2 is recommended for all new deployments.

V1 type weights:

Type	Weight
feat	1.0
fix	0.9
refactor	0.85
perf	0.9
test	0.6
docs	0.3
chore	0.3
style	0.1

V1 indicator bonuses:

Indicator	Bonus
touches_core_system	3.0
introduces_new_pattern	2.0
has_migration	1.5
dependencies_changed	1.0
new_modules_created	1.0 per module
tests_added	0.5 per test

V1 domain multipliers:

Domain Criticality	Multiplier
critical	1.5×
high	1.2×
medium	1.0×
low	0.7×
trivial	0.3×

Commit Scoring (V2) ​

Design Principles ​

The Formula ​

Normalized Core ​

Effort Bonus ​

Quality Bonus ​

Risk Bonus ​

Score Examples ​

Example 1: Simple bug fix ​

Example 2: Feature with tests ​

Example 3: Critical security refactor ​

Fallback Heuristics ​

Scoring Presets ​

Default ​

Quality-Focused ​

Risk-Aware ​

V1 Scoring (Legacy) ​