Skip to content

Forecasting & Trending

ShipLens doesn't just tell you what happened — it tells you what's about to happen. Forecasting uses historical commit data to predict future engineering metrics, detect trends, and flag anomalies before they become problems.

Why Forecasting?

A CTO who only looks at last week's numbers is always reacting. Forecasting shifts the conversation from "what went wrong" to "what's changing" — giving you time to intervene before a trend becomes a crisis.

Forecasted Metrics

Seven metrics are tracked, trended, and predicted:

#MetricUnitWhat It Measures
1Velocitycommits/weekRaw throughput of the team
2Cycle timehoursTime from first commit on a branch to PR merge
3Average score0-10Mean commit score (V2 scoring)
4Commit frequencycommits/day/contributorHow consistently contributors are shipping
5PR cycle timehoursTime from PR open to PR merge
6Slop index0.0-1.0Proportion of AI-generated code without human refinement
7Deploy frequencydeploys/weekHow often code reaches production

Methods

Moving Averages

Each metric is computed as a 7-day simple moving average (SMA) and a 28-day SMA:

SMAn=1ni=0n1xti

The 7-day SMA captures short-term momentum. The 28-day SMA captures the underlying trend. When the 7-day crosses above or below the 28-day, it signals a trend change.

Linear Regression

For forward predictions, ShipLens fits a simple linear regression over the last 28 data points:

y^=β0+β1t

Where:

  • y^ = predicted metric value
  • t = time (days from start of window)
  • β0 = intercept (baseline value)
  • β1 = slope (rate of change per day)

The slope β1 directly indicates whether the metric is improving or declining. Predictions are generated for 7 days ahead and 14 days ahead.

Anomaly Detection

An anomaly is flagged when a metric deviates more than 2 standard deviations from its 28-day moving average:

anomaly if |xtSMA28|>2σ28

Where σ28 is the standard deviation over the same 28-day window.

Examples of anomalies:

  • Velocity suddenly drops by 40% mid-sprint
  • Slop index spikes from 0.1 to 0.5 in one week
  • Cycle time doubles without an obvious cause

Anomalies are surfaced in the UI with a visual flag and included in weekly digests.

Confidence Intervals

Every prediction includes a 90% confidence interval:

y^±1.645×σ28×1+1n

Where n is the number of data points in the regression window.

Wider intervals indicate less predictable metrics — which is itself useful information. A metric with very wide confidence intervals may be too volatile to forecast meaningfully, suggesting underlying process instability.

Trend Classification

Each metric is classified into one of three trend states based on the regression slope β1 and its statistical significance:

TrendConditionInterpretation
Improvingβ1>0 and p<0.1 (for metrics where higher is better)Metric is getting better with statistical confidence
Decliningβ1<0 and p<0.1 (for metrics where higher is better)Metric is getting worse with statistical confidence
Stablep0.1 (slope not statistically significant)No meaningful change detected

For metrics where lower is better (cycle time, PR cycle time, slop index), the direction is inverted: a negative slope is "improving."

TIP

A "stable" classification is often good news. It means the team is operating consistently. Not every metric needs to be improving all the time — sustainability matters more than constant acceleration.

Background Processing

Forecasts are generated by the ForecastWorker, an Oban background job that runs daily:

JobScheduleWhat It Does
ForecastWorkerDaily (early morning)Computes SMAs, fits regressions, classifies trends, flags anomalies

The worker processes all metrics for all projects and squads. Results are stored as forecast snapshots, allowing historical comparison of predictions vs actuals.

Route

/c/:slug/forecasts

The forecasts page shows:

  • Current trend classification for each metric (improving / stable / declining)
  • 7-day and 14-day predictions with confidence intervals
  • Historical trend lines with SMA overlays
  • Anomaly flags with timestamps and severity
  • Per-squad and per-contributor breakdowns

Built with intelligence, not surveillance.