Skip to main content
← projects
in-flight mlopsinfra

signal-stream — online feature serving for live alpha models

The production half of the loop: the same feature graph that ran in research, now consuming a market-data stream and emitting signals into the trading stack — with point-in-time guarantees, lineage, and a kill switch.

Role

Research Engineer

Stack

Python · Rust · Kafka · Redis · gRPC · Prometheus

p99 latency
2.4ms
feature freshness
≤ 1 bar
schema drift checks
every emit

What it does

Models registered in alpha-bench come with a deployment manifest: the feature graph they consume, their refit cadence, their schema, and the signal they emit.

signal-stream takes that manifest and runs the same graph live. No rewrites — the feature definitions are imported directly from feature-forge. The only difference is the executor: a streaming runtime that materialises each node as new bars arrive, instead of a batch runtime that reads parquet.

Why this matters

The fastest way to lose money is to deploy a research model whose production features have drifted from training. So every emit is guarded:

  • schema hash check — the live feature schema must match the model’s training schema, byte-exact
  • distribution canary — rolling window of feature values is compared against the training distribution; a 4-sigma shift trips a soft kill
  • PnL canary — paper-traded shadow PnL is logged alongside live PnL and divergence over N days routes the model to triage

Architecture (sketch)

        ┌────────────┐    ┌──────────────┐    ┌────────────┐
md ──► │  ingestor   │──► │ feature-forge│──► │  models    │──► signal
        │  (rust)    │    │  (online)    │    │  (python)  │      bus
        └────────────┘    └──────┬───────┘    └────────────┘


                          point-in-time
                          feature store
                          (Redis + Arrow)

The Rust ingest tier handles fan-in and timestamp normalisation; the Python online runtime owns feature compute and model inference. The boundary is a typed Arrow channel — same schema as the research lake, zero copy on the hot path.

Operability

  • one Grafana board per model, owned by the team that registered it
  • pager only routes on freshness or schema failures — alpha drift goes to a dashboard, not a phone, because that’s research triage, not infra triage
  • every signal carries a model_id, feature_graph_hash, data_window_end — so a downstream trade can be replayed against the exact feature snapshot that produced it

What I’d ship next

  • A shadow-to-canary harness that automates the “5% capital under monitoring” rollout pattern and surfaces “promote / hold / rollback” as a single decision
  • WASM-compiled feature kernels so the same code runs in the pre-trade strategy host as well as the streaming tier