notebook-as-code — quants stay in Jupyter; the platform owns the run
A small library and a sidecar that let a researcher prototype freely in a notebook, then promote the cell that worked into a versioned, reproducible pipeline — without rewriting it. The notebook is the IDE; the platform is the runtime.
Research Engineer
Python · JupyterLab · papermill · Kubernetes · Argo Workflows
The user complaint that started this
“Every time something works in my notebook I have to spend a day porting it to your framework. I just want to keep going.”
Fair. The platform’s job is to give researchers leverage, not homework.
What nbcode does
You write:
# %% [meta]
# id: xsmom_v3
# kind: model
# horizon: 10d
# owner: team-mid-frequency-equities
# %% [features]
ret_20 = features.ret(window=20)
ret_60 = features.ret(window=60)
turn = features.turnover(20)
# %% [model]
m = Ridge().fit(X=[ret_20, ret_60, turn], y=fwd_ret_10d)
# %% [eval]
report = walk_forward(m, embargo="5d", folds="12")
The sidecar reads those cell tags, builds the same model object
alpha-bench would have built, and registers
it. When you nbcode promote, it:
- snapshots the notebook to git
- serialises the cells into a build spec
- submits a containerised Argo job that re-runs the spec from scratch with the inputs the cell declared — not whatever the kernel happened to have in scope
- on success, registers the artefact in model-registry
Reproducibility comes from the build spec, not from the notebook state. The notebook is the editor; the spec is the truth.
What it’s not
It’s not a “notebook scheduler.” Running notebooks in production
unmodified is how research teams accumulate technical debt. nbcode
extracts the intent of the notebook (cells tagged as features,
model, eval) and runs that — leaving the exploratory cells behind.
How I introduced it
- A two-week pairing program with the two most prolific quants. They didn’t read docs; they pushed code. I watched. The bits they fought with became the API.
- A “promote” button that ran their current notebook into registry-as-candidate (not production) for the first month — so the friction was identical to “what they’d already do” minus the porting.
- After uptake hit ~70%, I made the framework-only path the slower one — quants self-organised the rest.
What I’d ship next
- VS Code parity — a non-trivial slice of researchers prefer it
- A “diff this notebook against its last promoted version” view that highlights cells that changed — most regression bugs hide in cells the quant didn’t think they touched