Beyond Chatbots: How I turned Python Notebooks into AI-Accessible Systems
Notes from an OSS Dev on how I built --mcp in marimo
When we talk about AI-native developer tools we often focus on chat assistants or code completion, but true AI integration happens when the tools we use every day like notebooks can speak the same language as AI systems. That’s what I just built for marimo.io, an AI-native reactive notebook (16k+★.) The newly released --mcp flag turns any notebook into a Model Context Protocol (MCP) server, exposing structured tools that let AI systems inspect, diagnose and reason about notebooks in a standard way. This isn’t a chatbot bolted on top of marimo, it’s the foundation for making notebooks part of a shared AI-driven ecosystem. In this blog post, I will share how I designed and built --mcp, lessons learned for other devs working with AI and my personal tips on how to get the most out of the new feature.
If you find this post helpful, type your email and hit Subscribe. I’ll send the next installment straight to your inbox.
From isolated notebooks to interoperable systems
Marimo notebooks are already reactive and app-like, automatically updating when data or code changes. But until now, they were still isolated. The --mcp flag bridges that gap. by running: marimo edit notebook.py --mcp you turn your notebook into an MCP server that can communicate with any compatible client like IDEs or local LLM agents.
The value is interoperability:
Visibility: AI clients can see what’s happening inside notebooks in real time.
Diagnostics: They can query errors, variables and data structures directly.
Assistance: They can reason about what’s wrong or missing without relying on screen scraping or copy-pasting. AI agents can now use well-defined APIs to gather the exact context they need.
Designing for effortless context
My design philosophy was simple: no configuration, no boilerplate, just context. When --mcp is enabled, marimo automatically exposes a small, powerful set of read-only tools that reflect the current notebook session. These tools are grouped around how people naturally debug or audit notebooks:
Inspection: get_active_notebooks, get_lightweight_cell_map, get_cell_runtime_data
Data: get_tables_and_variables, get_database_tables
Debugging: get_notebook_errors
Reference: get_marimo_rules
My guiding principle was discoverability through structure. Every tool has a clear schema that an AI system can understand. This allows a client or even a chain of AI tools to combine them intelligently to solve real problems.
Main challenges with building --mcp
The first challenge was synchronization. Marimo’s runtime reacts to code changes while MCP expects stable responses to incoming requests. I had to design a concurrency bridge that ensures each MCP call gets a consistent snapshot of the notebook without freezing the reactive engine.
The second challenge was tool schema design. Each MCP tool needed to express Marimo’s dynamic notebook state (cell IDs, tracebacks, tables, variable types) in a way that was both machine-readable and safe. The goal was to avoid ambiguity so that an AI system could build higher-order reasoning on top.
Finally, I wanted to make sure that tool chaining felt natural. That the tools weren’t isolated RPCs, but composable building blocks. This meant ensuring the tools consistently used the existing standardized fields like notebook session IDs, cell IDs and data signatures so one tool’s output could become another’s input.
Valuable lessons learned
Make observability a first-class citizen. The hardest part of debugging AI-generated notebooks is getting reliable state not fixing code.
Design for composition not isolation. Small, well-typed tools combine into powerful workflows.
MCP enables structured collaboration, not automation. By exposing read-only tools with clear schemas, AI systems gain reliable context while humans maintain control.
How to use --mcp : Practical workflows for marimo users
With --mcp enabled you can start building workflows that combine multiple tools to solve notebook problems end-to-end, such as:
1. Multi-notebook error auditing
Use get_active_notebooks to list all open notebooks.
For each one, call get_notebook_errors to find failing cells.
For problematic cells, call get_cell_runtime_data to extract full tracebacks and variable states.
Combine with get_marimo_rules to generate AI-guided suggestions for fixing the pattern of errors.
You get a workspace-wide diagnostic report that’s machine-generated, explainable, and traceable.
2. Data integrity and schema drift checking (which was built by my awesome colleague Shahmir Varqha)
get_tables_and_variables retrieves current in-memory data structures.
get_database_tables pulls authoritative schema information.
The difference between them can highlight drift, missing columns, renamed fields or unexpected nulls.
An LLM can summarize this into a human-readable “schema mismatch” report.
This helps prevent subtle data bugs before they cascade into broken visualizations or reports.
3. Structural refactoring and documentation
get_lightweight_cell_map gives an outline of all code and markdown cells.
Combine that with get_cell_runtime_data for runtime characteristics (execution time, errors, variables).
Feed that data into a summarizer that produces structured documentation: cell purposes, dependencies and data lineage.
The result: a living README that stays synchronized with the notebook itself.
Still got questions? Drop me a line in the comments and I’ll try my best to get back to you. Meanwhile, if you found this post useful, share it with a friend and consider subscribing. I will be sharing more lessons from the trenches of open‑source, Gen AI, and MCP every week.

