Before/After Analysis
Every process change, team restructure, or tooling investment is a bet. Before/After analysis tells you whether the bet paid off — with data, not opinions.
Purpose
CTOs regularly make interventions: adopting a new branching strategy, restructuring squads, introducing code review requirements, switching CI providers. These changes are expensive in terms of team disruption, and their impact is usually assessed through gut feeling or anecdote.
Before/After analysis selects two time periods and compares engineering metrics across them, giving you a quantified answer to: "Did this change actually improve things?"
How It Works
- Select the intervention date — The point in time when the change took effect.
- Define the "before" period — A window before the intervention (default: 28 days).
- Define the "after" period — A window after the intervention (default: 28 days).
- Compare — ShipLens computes each metric for both periods and shows the delta.
|<-- Before (28 days) -->|<-- Intervention -->|<-- After (28 days) -->|TIP
Choosing equal-length periods is important for fair comparison. ShipLens defaults to 28 days (4 weeks) to capture a full sprint cycle and smooth out weekly patterns. Shorter periods increase noise; longer periods may include confounding factors.
Compared Metrics
Five metrics are compared across the two periods:
1. Velocity
| Change | Interpretation |
|---|---|
| > +15% | Significant increase in throughput |
| -15% to +15% | No meaningful change |
| < -15% | Significant decrease — expected during adjustment periods |
2. Average Score
Compared as an absolute difference (not percentage) since scores are on a fixed 0-10 scale.
| Change | Interpretation |
|---|---|
| > +0.5 | Meaningful quality improvement |
| -0.5 to +0.5 | No meaningful change |
| < -0.5 | Quality regression |
3. Commit Frequency
Measured as commits per contributor per day. Normalized by active contributor count to account for team size changes.
4. Type Distribution
The proportion of each commit type (feat, fix, refactor, test, docs, chore, style, perf) in each period:
What to look for:
- Increase in
featshare after removing process bottlenecks - Decrease in
fixshare after improving test practices - Increase in
refactorshare after dedicating tech debt sprints
5. Slop Index
Tracks whether AI-generated code quality changed after the intervention.
Statistical Comparison
ShipLens goes beyond comparing averages. For each metric, the comparison includes:
| Statistical Measure | Purpose |
|---|---|
| Mean | Central tendency — the "typical" value |
| Median | Robust central tendency — unaffected by outliers |
| Standard deviation | Spread — how consistent the metric is |
| Distribution chart | Visual comparison of the full distribution, not just the center |
Why distributions matter: An average score increase from 3.5 to 4.0 could mean everyone improved slightly — or it could mean one person started scoring 9s while everyone else stayed the same. The distribution chart reveals which scenario is happening.
Effect Size
For each metric, ShipLens computes Cohen's d to measure effect size:
Where:
| Cohen's d | Interpretation |
|---|---|
| < 0.2 | Negligible effect |
| 0.2 - 0.5 | Small effect |
| 0.5 - 0.8 | Medium effect |
| > 0.8 | Large effect |
This matters because a statistically "significant" change in velocity might be practically meaningless (e.g., +2 commits/week on a base of 200). Cohen's d tells you whether the change is large enough to matter.
Use Cases
Sprint Retrospective
Compare the current sprint against the previous sprint:
- Did velocity hold steady?
- Did the fix ratio decrease (indicating fewer bugs)?
- Did the new code review process improve average scores?
Process Change
Adopted trunk-based development? Compare the 4 weeks before and after:
- Expected: higher commit frequency, lower cycle time
- Watch for: score regression (speed vs quality tradeoff)
Team Reorganization
Restructured squads? Compare performance before and after:
- Allow a 2-week adjustment period before starting the "after" window
- Compare at both squad and individual contributor level
Tooling Investment
Introduced a new testing framework or CI pipeline? Measure the impact:
- Expected: increase in
testtype commits, decrease infixratio - Timeline: may take 4-8 weeks to show measurable impact
Route
/c/:slug/before-afterThe before/after page shows:
- Date picker for intervention point and period lengths
- Side-by-side metric comparison with deltas and effect sizes
- Distribution charts for each metric (before vs after overlay)
- Type distribution stacked bar comparison
- Summary card with overall assessment (improved / no change / declined)
