Skip to content

Developer Composite Score

While commit scores measure individual contributions, the Developer Composite Score provides a holistic view of a contributor's overall performance over time. It combines six weighted dimensions into a single 0–10 score.

Why a Composite Score?

Individual commit scores are useful but noisy. A developer might have a few high-scoring commits surrounded by low-scoring maintenance work. The composite score smooths out this variance and rewards consistency, growth, and good practices — not just big-bang commits.

The Formula

composite=(iwi×di)×10

Where each dimension di is normalized to the range [0, 1] and the weights sum to 1.0:

DimensionWeightWhat It Measures
Average Score0.30Mean commit score across the period
Weighted Impact0.25Recent commits count more (recency-weighted)
Consistency0.15How stable are scores week-to-week?
Quality Ratio0.15Percentage of commits with test coverage
Complexity Trend0.10Is the developer tackling increasingly complex work?
Volume0.05Commit count (minimal weight, intentionally low)

Why is volume only 5%?

Volume (commit count) is intentionally the lowest-weighted dimension. We want to avoid incentivizing commit inflation. A developer who ships 5 high-quality, well-tested commits per week should score higher than one who ships 50 trivial changes.

Dimension Details

Average Score (30%)

The arithmetic mean of all commit scores in the period. Normalized against a theoretical max of 10.0.

davg=1nj=1nsj÷10.0

Weighted Impact (25%)

A recency-weighted sum that gives more weight to recent work. Commits from today count at full weight; commits from 90 days ago count at 10% weight.

recencyj=max(1.0age_daysj90,0.1)dimpact=jsj×recencyjjrecencyj÷10.0

The minimum recency weight is 0.1 — even 90+ day old commits contribute slightly, preventing discontinuities.

Consistency (15%)

Measures how stable a developer's scores are across weeks using the coefficient of variation (CV):

CV=σweeklyμweeklydconsistency=max(1.0CV,0.0)

A developer who scores 6.0 every week (CV ≈ 0) gets a perfect consistency score. A developer who alternates between 2.0 and 9.0 (high CV) gets a low consistency score.

Why reward consistency?

Consistent output indicates reliable engineering. Erratic scores often signal context-switching, unclear requirements, or work that's being split ineffectively. Consistency rewards sustainable productivity.

Quality Ratio (15%)

The fraction of commits that include test coverage (the has_tests quality signal):

dquality=|{c:has_tests(c)}|n

This is intentionally simple — it measures the habit of writing tests, not the quality of those tests.

Complexity Trend (10%)

Uses linear regression on complexity values over time to determine if a developer is tackling increasingly complex work:

slope=linear_regression(complexity values over commits)dtrend=clamp(0.5+slope×2.0,0.0,1.0)
SlopeTrend ScoreInterpretation
-0.250.0Declining complexity (clamped)
0.00.5Flat — stable complexity
+0.251.0Growing complexity (capped)

A flat trend (slope = 0) gives 0.5 — the midpoint. Growing complexity trends upward, declining trends downward.

Volume (5%)

Commit count, normalized against a reasonable maximum. This dimension has the lowest weight by design.

Score Distribution

With these weights, here's what different profiles look like:

ProfileAvg ScoreImpactConsistencyQualityTrendVolumeComposite
Senior IC7.07.50.850.800.600.406.7
Productive junior4.03.50.700.500.800.904.4
Consistent mid-level5.55.00.950.900.500.605.8
Erratic senior8.07.00.300.400.400.305.5

Notice the erratic senior: despite high raw scores, the low consistency and quality ratio pull the composite down significantly.

Slop Metrics

AI slop metrics are tracked alongside the composite score but are not included in the formula. They are informational:

MetricDescription
avg_slopMean slop index across commits
slop_ratePercentage of commits with slop index > 0.4
slop_trendDirectional trend of slop index over time

These are displayed separately in the contributor profile for manager awareness, but don't affect scoring.

Built with intelligence, not surveillance.