Observability for alpha pipelines: three dashboards, one rule
If a model can be in production, you should be able to answer three questions about it without writing a query. Each question gets its own dashboard, and they don't share an owner.
Observability for trading models doesn’t have to be exotic; it has to be boring. The rule I work toward: any model in production must answer three questions without anyone writing a query.
Q1 — Is the model receiving the data it expects?
The freshness board. One panel per active model, each showing:
- Time since last feature update
- Schema hash vs the model’s training schema (green / red)
- Per-feature staleness — anything more than
Nbars behind goes red
This board pages. If a feature stops arriving or the schema drifts, nothing else matters yet — the model is hallucinating off stale or malformed input.
Owned by the platform team, not the research team. The research team shouldn’t be paged for a Kafka hiccup.
Q2 — Is the model producing what it produced in training?
The calibration board. One panel per active model, each showing:
- Rolling distribution of the model’s output signal vs its training distribution — KS / 4-sigma alarm
- Rolling distribution of each input feature vs training — same alarm
- Per-feature contribution to today’s signal (for linear models;
SHAP digests for non-linear ones), sparkline over the last
Nbars
This board doesn’t page. It surfaces. It’s where you go when you’re asked “why did the model do that?” and you don’t have time for a forensic notebook.
Owned by the team that registered the model.
Q3 — Is the model still earning?
The PnL board. One panel per active model, each showing:
- Live PnL, paper PnL (shadow), and the divergence between them
- Cumulative excess return vs the benchmark the model registered against
- Decay tracker: a rolling Sharpe over
30/60/120day windows, flagged when the shorter windows deteriorate relative to the longer ones
This board also doesn’t page. Sharpe decline is research news, not infra news. The board has a “open triage ticket” button so the move from “noticing” to “deciding” is one click — and the ticket lands in the owning team’s queue, not in a pager.
The rule
If you can’t answer one of these three questions about a model without writing a query, the model isn’t ready for production. Boards beat queries because boards have owners, and questions with owners get answered.
Two anti-patterns I’ve watched up close
- One dashboard for everything. Looks comprehensive; nobody reads it. A trader on the floor wants Q3. The on-call engineer wants Q1. The model’s author wants Q2. Mash them and each ignores the parts that aren’t theirs.
- Per-model bespoke dashboards. The model author builds a beautiful custom view, leaves the team, and a year later nobody remembers what the panels mean. Boards should be generated by the platform from the model’s registered manifest — same panels, same units, same units per panel.
Three dashboards, one rule, two owners. Most days the boards are boring, which is the goal.