Observability for alpha pipelines: three dashboards, one rule

Observability for trading models doesn’t have to be exotic; it has to be boring. The rule I work toward: any model in production must answer three questions without anyone writing a query.

Q1 — Is the model receiving the data it expects?

The freshness board. One panel per active model, each showing:

Time since last feature update
Schema hash vs the model’s training schema (green / red)
Per-feature staleness — anything more than N bars behind goes red

This board pages. If a feature stops arriving or the schema drifts, nothing else matters yet — the model is hallucinating off stale or malformed input.

Owned by the platform team, not the research team. The research team shouldn’t be paged for a Kafka hiccup.

Q2 — Is the model producing what it produced in training?

The calibration board. One panel per active model, each showing:

Rolling distribution of the model’s output signal vs its training distribution — KS / 4-sigma alarm
Rolling distribution of each input feature vs training — same alarm
Per-feature contribution to today’s signal (for linear models; SHAP digests for non-linear ones), sparkline over the last N bars

This board doesn’t page. It surfaces. It’s where you go when you’re asked “why did the model do that?” and you don’t have time for a forensic notebook.

Owned by the team that registered the model.

Q3 — Is the model still earning?

The PnL board. One panel per active model, each showing:

Live PnL, paper PnL (shadow), and the divergence between them
Cumulative excess return vs the benchmark the model registered against
Decay tracker: a rolling Sharpe over 30/60/120 day windows, flagged when the shorter windows deteriorate relative to the longer ones

This board also doesn’t page. Sharpe decline is research news, not infra news. The board has a “open triage ticket” button so the move from “noticing” to “deciding” is one click — and the ticket lands in the owning team’s queue, not in a pager.

The rule

If you can’t answer one of these three questions about a model without writing a query, the model isn’t ready for production. Boards beat queries because boards have owners, and questions with owners get answered.

Two anti-patterns I’ve watched up close

One dashboard for everything. Looks comprehensive; nobody reads it. A trader on the floor wants Q3. The on-call engineer wants Q1. The model’s author wants Q2. Mash them and each ignores the parts that aren’t theirs.
Per-model bespoke dashboards. The model author builds a beautiful custom view, leaves the team, and a year later nobody remembers what the panels mean. Boards should be generated by the platform from the model’s registered manifest — same panels, same units, same units per panel.

Three dashboards, one rule, two owners. Most days the boards are boring, which is the goal.