-
Notifications
You must be signed in to change notification settings - Fork 10
New M3's Tool-Agnostic Highly-Typed Architecture: Chaining-Method API Style #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cc-ing @rafiattrach @rajna-fani @MoreiraP12. So sorry for the long long read guys, this was somewhat necessary as we do not meet online, and to justify the |
7736b74
to
ce7fe71
Compare
051ffd5
to
41a263d
Compare
1 task
ee94d58
to
83e7d37
Compare
83e7d37
to
3bab486
Compare
Now externalised in https://github.com/MCP-Pipeline/MCPStack via the initiative @ https://github.com/MCP-Pipeline. Thanks for the help! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hiya guys!
I hope everything is great for every1ne. I'm very happy to finally release this PR; it's been a tough week of learning, designing, developing, and reworking to propose this new architecture of the
M3
library —— I know it hasn't been discussed, but hear me out —— and it turns out it's not too bad-looking, hopefully! 🤞Important
Please take note, I am worlds away from wanting anyone
to take personallythat I enhanced the majority of the architecture; the goal of the PR is with sole purpose of enhancing the library's long-term trajectory, and, as Leo mentioned in the email, walking towards this ``toolbox''. The prior codebase was enjoyable to stroll through and investigate, thanks for such a cool tool, but I believe what is presented below might be an excellent next step for achieving the potential toolbox.❝ In a nutshell
The new
M3
's architecture is (1) tool-agnostic, (2) chaining-based method API-style shaped, (3) highly typed checked to avoid side effects when data flows in and out, (4) ready to scale to more than one MCP tool / scales MIMIC to more datasets from Physionet, and (5) introduces presets (see below). Plus, a lil bonus, UI-wise it also is evolving, see below.On top of things, it should function similarly to
0.2.0
, hopefully. Speaking of which, I would much appreciate (maybe @rafiattrach ?) if someone may kindly jump in the branch and try setting up the MIMIC tool with the BigQuery / Authentication way, as I do not have such access. Mistakes happen easily in such a large refactoring, so your eyes and try would be safer, please 🙏❝ The New Architecture
Below will be explored the ground lines of the new architecture, while the code says it all I believe it won't hurt to discuss about it below for context purposes in case it is needed.
Prior all, two things to note. (1) The new design is not only tool-agnostic, but it also follows the Scikit-learn pipeline philosophy, allowing users to construct an M3 pipeline in the same way as Sklearn pipelines are. That is, we can stack (
compose
) any M3 tools offered and connect the dots between those we want to play with. Once composed, instead of runningfit(.)
, we canbuild(.)
it, and instead of callingpredict(.)
, we canrun(.)
it. (2) As previously stated, the library uses a chaining-based, API-style approach. The rationale is to prevent having constructors with 50 parameters in the long run; having those chaining methods makes it more resilient and user-friendly e.g,M3().with_config(<...>).with_tool(<tool_1>).with_tool(<tool_2>)
orM3().with_preset(<preset_of_interest>)
, etc.More specifically,
Core Components (
core/
)This core layer manages the library's essential building components, enforcing broadly speaking consistency across the library and its ecosystem. Should not be much utilised on the user's side more on the authors' / contributors' sides. As follows:
M3 Tool Framework: Abstract
BaseTool
enforces uniform structure for M3 supported tools (e.g.MIMIC
), defining lifecycle methods (e.g.,initialize
,teardown
) andactions
. Includes as well:BaseToolCLI
for standardized CLI commands (e.g.,init
,configure
) so that each tools' CLIs are following a similar structure. Anyway, all of these are the components for creating the M3 supported tools, as suchMIMIC
has been refactored accordingly.MCP Config Generation: Enables generation of configs for various MCP hosts (e.g.,
FastMCP
,ClaudeDesktop
) viaMCPConfigGenerator
base class. On the long term, one may want to support more export to different MCP host, this is where the new primitives will reside and will be automatically leveraged by the whole system.Presets: Pre-configured Python M3 pipelines created via
Preset
base class. Basically allows for M3 pipelines to be ran based on a fixed defined configuration within the script, great for (1) benchmarks reproduction, (2) fast instantiation like defaults. E.g.,default_m3
creates the MIMIC tool with the SQLite backend and default dataset like doesm3 run config claude
in0.2.0
basically.M3 Configuration:
M3Config
class manages log levels, env vars, paths (e.g., data dirs), validation (e.g., for tools), etc. Supports serialization (to_dict
,from_dict
) and env application for consistent setup.Tools Section (
tools/
)Yay finally this is where all the M3 tools will reside. What's nice is that there is an auto-registration and validation of all the tools checking they follow well the architecture they need prior registration and allow being used throughout the library. Currently supports MIMIC only. As follows:
Tool Registry: Auto-registers and validates M3 tools via
registry.py
, ensuring structural compliance (e.g., main class, CLI presence) etc. On the long term if the tools evolve, more validation checks could be performed here as already available.MIMIC Tool: Core implementation for MIMIC-IV. Nothing new for you here but I might have missed some stuff out of the BigQuery/OAuth2 route as explained in the
In a nutshell
section. Additionally, what's actually cool however in the MIMIC tool is the configurations. YAML-based declarative files showcasing the supported datasets (datasets.yaml
) capable of being called via theinit
(more datasets could simply be added here, MIMIC-type of course), environment variables (env_vars.yaml
) whether they're required etc, and also referencing all the security checks (security.yaml
) that was hardcoded in py-files before.Main M3 Orchestration (
src/m3/
)Basically the main entry point of the library either programmatically or via CLI. As follows:
M3 Class: Chaining-based API in
m3.py
(e.g.,.with_config(...)
,.with_tool(...)
,.with_preset(...)
), avoiding complex constructors; supportsbuild
(for MCP hosts like FastMCP/Claude),run
(starts MCP server),save
/load
(JSON serialization), and validation/initialization.M3 CLI: Enhanced UI typer compared to before interface in
cli.py
plus leveraging the newM3
class as much as possible. I'll not say much more about the main outer CLI of the library as the videos below recap it very well I guess.A great foundation always makes it easier on the long term. ❞
Side notes. The unit tests should now be more flourishing too, despite the possibility of certain loopholes :) but more than 75% of the codebase is being tested against 30% in
0.2.0
. Very lastly. Code been enforced in typing viaBeartype
(O(1)
runtime checks), refer to #45 for the why. Removed redundant top of the file docstrings (favor class/method docs), added the greatTheFuzz
for fuzzy error handling when say you are looking for a tool but you made a typo when calling it (e.g., the system saysDid you mean X?
), and UI enhancements for better usability.Caution
While refactoring is complete, docstrings are more than minimal (one-liner per classes only) pending (1) final PR state post-reviews to avoid wasting time to such writing, and (2) repo org migration + ReadTheDocs setup (#40); bear with me—very happy to fully docstring once ready! :)
❝ Stop Waffling & Show Some
Before
➠After
m3 --help
:1-help.mp4
Various
m3
utilities CLI commands:2-utilities.mp4
UI enhancements when downloading datasets via
m3 run mimic init
3-mimic_init.mp4
m3 run config claude
:4-mimic_claude.mp4
Leverage presets (basically doing
m3 run config Claude
of above in one line too):5-presets.mp4
Build the first M3 Pipeline w/ two tools in play for
Claude
🎉I've removed the tool as it was a non-useful / out of scope one just for the sake of the example :)
6-multi_tools_pipeline.mp4
Hope it helps,
Cheeeeers!