New M3's Tool-Agnostic Highly-Typed Architecture: Chaining-Method API Style #51

simonprovost · 2025-07-18T18:33:17Z

Hiya guys!

I hope everything is great for every1ne. I'm very happy to finally release this PR; it's been a tough week of learning, designing, developing, and reworking to propose this new architecture of the M3 library —— I know it hasn't been discussed, but hear me out —— and it turns out it's not too bad-looking, hopefully! 🤞

Important

Please take note, I am worlds away from wanting anyone ~~to take personally~~ that I enhanced the majority of the architecture; the goal of the PR is with sole purpose of enhancing the library's long-term trajectory, and, as Leo mentioned in the email, walking towards this ``toolbox''. The prior codebase was enjoyable to stroll through and investigate, thanks for such a cool tool, but I believe what is presented below might be an excellent next step for achieving the potential toolbox.

❝ In a nutshell

The new M3's architecture is (1) tool-agnostic, (2) chaining-based method API-style shaped, (3) highly typed checked to avoid side effects when data flows in and out, (4) ready to scale to more than one MCP tool / scales MIMIC to more datasets from Physionet, and (5) introduces presets (see below). Plus, a lil bonus, UI-wise it also is evolving, see below.

On top of things, it should function similarly to 0.2.0, hopefully. Speaking of which, I would much appreciate (maybe @rafiattrach ?) if someone may kindly jump in the branch and try setting up the MIMIC tool with the BigQuery / Authentication way, as I do not have such access. Mistakes happen easily in such a large refactoring, so your eyes and try would be safer, please 🙏

❝ The New Architecture

Below will be explored the ground lines of the new architecture, while the code says it all I believe it won't hurt to discuss about it below for context purposes in case it is needed.

Prior all, two things to note. (1) The new design is not only tool-agnostic, but it also follows the Scikit-learn pipeline philosophy, allowing users to construct an M3 pipeline in the same way as Sklearn pipelines are. That is, we can stack (compose) any M3 tools offered and connect the dots between those we want to play with. Once composed, instead of running fit(.), we can build(.) it, and instead of calling predict(.), we can run(.) it. (2) As previously stated, the library uses a chaining-based, API-style approach. The rationale is to prevent having constructors with 50 parameters in the long run; having those chaining methods makes it more resilient and user-friendly e.g, M3().with_config(<...>).with_tool(<tool_1>).with_tool(<tool_2>) or M3().with_preset(<preset_of_interest>), etc.

More specifically,

Core Components (`core/`)

This core layer manages the library's essential building components, enforcing broadly speaking consistency across the library and its ecosystem. Should not be much utilised on the user's side more on the authors' / contributors' sides. As follows:

M3 Tool Framework: Abstract BaseTool enforces uniform structure for M3 supported tools (e.g. MIMIC), defining lifecycle methods (e.g., initialize, teardown) and actions. Includes as well: BaseToolCLI for standardized CLI commands (e.g., init, configure) so that each tools' CLIs are following a similar structure. Anyway, all of these are the components for creating the M3 supported tools, as such MIMIC has been refactored accordingly.
MCP Config Generation: Enables generation of configs for various MCP hosts (e.g., FastMCP, ClaudeDesktop) via MCPConfigGenerator base class. On the long term, one may want to support more export to different MCP host, this is where the new primitives will reside and will be automatically leveraged by the whole system.
Presets: Pre-configured Python M3 pipelines created via Preset base class. Basically allows for M3 pipelines to be ran based on a fixed defined configuration within the script, great for (1) benchmarks reproduction, (2) fast instantiation like defaults. E.g., default_m3 creates the MIMIC tool with the SQLite backend and default dataset like does m3 run config claude in 0.2.0 basically.
M3 Configuration: M3Config class manages log levels, env vars, paths (e.g., data dirs), validation (e.g., for tools), etc. Supports serialization (to_dict, from_dict) and env application for consistent setup.

Tools Section (`tools/`)

Yay finally this is where all the M3 tools will reside. What's nice is that there is an auto-registration and validation of all the tools checking they follow well the architecture they need prior registration and allow being used throughout the library. Currently supports MIMIC only. As follows:

Tool Registry: Auto-registers and validates M3 tools via registry.py, ensuring structural compliance (e.g., main class, CLI presence) etc. On the long term if the tools evolve, more validation checks could be performed here as already available.
MIMIC Tool: Core implementation for MIMIC-IV. Nothing new for you here but I might have missed some stuff out of the BigQuery/OAuth2 route as explained in the In a nutshell section. Additionally, what's actually cool however in the MIMIC tool is the configurations. YAML-based declarative files showcasing the supported datasets (datasets.yaml) capable of being called via the init (more datasets could simply be added here, MIMIC-type of course), environment variables (env_vars.yaml) whether they're required etc, and also referencing all the security checks (security.yaml) that was hardcoded in py-files before.

Main M3 Orchestration (`src/m3/`)

Basically the main entry point of the library either programmatically or via CLI. As follows:

M3 Class: Chaining-based API in m3.py (e.g., .with_config(...), .with_tool(...), .with_preset(...)), avoiding complex constructors; supports build (for MCP hosts like FastMCP/Claude), run (starts MCP server), save/load (JSON serialization), and validation/initialization.
M3 CLI: Enhanced UI typer compared to before interface in cli.py plus leveraging the new M3 class as much as possible. I'll not say much more about the main outer CLI of the library as the videos below recap it very well I guess.

A great foundation always makes it easier on the long term. ❞

Side notes. The unit tests should now be more flourishing too, despite the possibility of certain loopholes :) but more than 75% of the codebase is being tested against 30% in 0.2.0. Very lastly. Code been enforced in typing via Beartype (O(1) runtime checks), refer to #45 for the why. Removed redundant top of the file docstrings (favor class/method docs), added the great TheFuzz for fuzzy error handling when say you are looking for a tool but you made a typo when calling it (e.g., the system says Did you mean X?), and UI enhancements for better usability.

Caution

While refactoring is complete, docstrings are more than minimal (one-liner per classes only) pending (1) final PR state post-reviews to avoid wasting time to such writing, and (2) repo org migration + ReadTheDocs setup (#40); bear with me—very happy to fully docstring once ready! :)

❝ Stop Waffling & Show Some `Before` ➠ `After`

m3 --help:

1-help.mp4

Various m3 utilities CLI commands:

2-utilities.mp4

UI enhancements when downloading datasets via m3 run mimic init

3-mimic_init.mp4

m3 run config claude:

4-mimic_claude.mp4

Leverage presets (basically doing m3 run config Claude of above in one line too):

5-presets.mp4

Build the first M3 Pipeline w/ two tools in play for Claude 🎉

I've removed the tool as it was a non-useful / out of scope one just for the sake of the example :)

6-multi_tools_pipeline.mp4

Hope it helps,

Cheeeeers!

…al primitives

simonprovost · 2025-07-18T18:36:42Z

cc-ing @rafiattrach @rajna-fani @MoreiraP12.

So sorry for the long long read guys, this was somewhat necessary as we do not meet online, and to justify the +6,736; −3,689. If you have any questions, fire them up in the comments, even prior reviews 🫡

simonprovost · 2025-08-21T22:21:55Z

Now externalised in https://github.com/MCP-Pipeline/MCPStack via the initiative @ https://github.com/MCP-Pipeline. Thanks for the help!

simonprovost added 5 commits July 18, 2025 17:04

fix(pre-commit): call uv run w/ pytest

56e696c

refactor: improve config into class M3Config

2d8e232

refactor(core): centralise M3 exceptions, helpers & logging

0af9ef6

feat(core): add preset base w/ default_M3 preset

48ae186

refactor(core): add MCP Conf. Gen. base w/ Claude,FastMCP and Univers…

bcfd459

…al primitives

simonprovost force-pushed the refactor/toolbox_based_architecture branch 2 times, most recently from 7736b74 to ce7fe71 Compare July 18, 2025 20:44

simonprovost self-assigned this Jul 18, 2025

simonprovost added the enhancement New feature or request label Jul 18, 2025

simonprovost requested a review from rafiattrach July 18, 2025 21:36

simonprovost added 8 commits July 20, 2025 17:06

feat(core): add M3 Tool base

90f9c47

feat(core): add M3 starting MCP server script

6208ace

refactor(tools): add MIMIC M3 Tool w/ auto-tools-registry

094867f

refactor: update CLI given new architecture

6b46759

feat: add core M3 w/ chaining-API-style class

c064dd4

refactor(tests): improve unit-tests given new architecture

de56abf

core: add deps. (Beartype, TheFuzz, ...)

2ab83b2

core: improve gitignore

41a263d

simonprovost force-pushed the refactor/toolbox_based_architecture branch from 051ffd5 to 41a263d Compare July 20, 2025 16:06

simonprovost mentioned this pull request Jul 21, 2025

An M3 tool to teach how to create new M3 Tool: Early stage #52

Closed

1 task

core: update pre-commit-hooks rev

ba316c2

simonprovost force-pushed the refactor/toolbox_based_architecture branch from ee94d58 to 83e7d37 Compare August 1, 2025 22:45

fix(CI): update uv symlink path

3bab486

simonprovost force-pushed the refactor/toolbox_based_architecture branch from 83e7d37 to 3bab486 Compare August 1, 2025 22:48

simonprovost mentioned this pull request Aug 1, 2025

Remove query interface section from Explanation component #53

Merged

simonprovost closed this Aug 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New M3's Tool-Agnostic Highly-Typed Architecture: Chaining-Method API Style #51

New M3's Tool-Agnostic Highly-Typed Architecture: Chaining-Method API Style #51

Uh oh!

simonprovost commented Jul 18, 2025 •

edited

Loading

Uh oh!

simonprovost commented Jul 18, 2025 •

edited

Loading

Uh oh!

simonprovost commented Aug 21, 2025

Uh oh!

Uh oh!

New M3's Tool-Agnostic Highly-Typed Architecture: Chaining-Method API Style #51

New M3's Tool-Agnostic Highly-Typed Architecture: Chaining-Method API Style #51

Uh oh!

Conversation

simonprovost commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❝ In a nutshell

❝ The New Architecture

More specifically,

Core Components (core/)

Tools Section (tools/)

Main M3 Orchestration (src/m3/)

❝ Stop Waffling & Show Some Before ➠ After

Uh oh!

simonprovost commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonprovost commented Aug 21, 2025

Uh oh!

Uh oh!

simonprovost commented Jul 18, 2025 •

edited

Loading

Core Components (`core/`)

Tools Section (`tools/`)

Main M3 Orchestration (`src/m3/`)

❝ Stop Waffling & Show Some `Before` ➠ `After`

simonprovost commented Jul 18, 2025 •

edited

Loading