Skip to content

Add Local AI Studio: Complete local AI development environment#150

Draft
chuckeelord wants to merge 2 commits into0x4m4:masterfrom
chuckeelord:claude/local-ai-studio-gAWa1
Draft

Add Local AI Studio: Complete local AI development environment#150
chuckeelord wants to merge 2 commits into0x4m4:masterfrom
chuckeelord:claude/local-ai-studio-gAWa1

Conversation

@chuckeelord
Copy link

Summary

This PR introduces Local AI Studio, a comprehensive local AI development environment with a web GUI, model management, chat interface, tool integration, and MCP server support. The system is optimized for consumer hardware (NVIDIA RTX 5060 + Snapdragon X Plus) and provides a fully self-contained AI development platform.

Key Changes

Core Architecture

  • Configuration System (config.py): Centralized configuration management with inference presets, system prompt templates, and directory layout
  • Hardware Detection (hardware.py): GPU/CPU detection, VRAM/RAM monitoring, and hardware-aware recommendations
  • Model Management (models/manager.py): Model registry, discovery, loading (GGUF via llama-cpp-python, Ollama integration), and lifecycle management
  • Chat Engine (chat/engine.py): Multi-turn conversation orchestration with streaming, system prompts, and tool invocation
  • Conversation Persistence (chat/history.py): Save/load conversations with metadata, tagging, and search capabilities

User Interface

  • Gradio Web GUI (gui/app.py): Full-featured browser interface with:
    • Chat tab with streaming responses and message history
    • Models tab for loading, downloading, and configuring models
    • Settings tab for inference parameters and system prompts
    • Tools tab for tool configuration and execution logs
    • Hardware monitoring dashboard
    • Conversation history management

Tool Integration Framework

  • Tool Executor (tools/executor.py): Unified tool registry and dispatcher with permission checks
  • Python Sandbox (tools/python_sandbox.py): Sandboxed code execution with timeout and memory limits
  • Filesystem Tools (tools/filesystem.py): Restricted file operations within allowed directories
  • Web Tools (tools/web.py): URL fetching and web scraping with URL filtering
  • Database Tools (tools/database.py): SQLite query execution with safety controls
  • Git Tools (tools/git_tools.py): Version control operations with destructive action prevention

Model Optimization

  • Quantization Utilities (models/quantization.py): GGUF quantization format metadata and VRAM estimation
  • LoRA Management (models/lora.py): LoRA adapter discovery, indexing, and application

Protocol Support

  • MCP Server (mcp/server.py): Model Context Protocol implementation for IDE/tool integration, exposing models, conversations, and tools as MCP resources

CLI & Entry Point

  • Command-line Interface (__main__.py): Multiple entry points:
    • gui: Launch web interface
    • mcp: Run MCP server
    • chat: Interactive CLI chat
    • scan: Model discovery
    • info: Hardware information
    • config: Configuration display

Notable Implementation Details

  • Hardware-Aware Defaults: System automatically detects GPU capabilities and recommends quantization levels and layer offloading based on available VRAM
  • Modular Tool System: Tools are registered dynamically with JSON Schema parameter definitions, enabling flexible AI-driven tool invocation
  • Conversation Metadata: Conversations are persisted with rich metadata (tags, model used, inference settings, system prompt) for organization and reproducibility
  • Safety-First Design: File operations, database queries, and code execution are restricted by configuration; destructive SQL operations are blocked by default
  • Streaming Support: Chat responses stream in real-time via Gradio's streaming interface
  • MCP Integration: Full Model Context Protocol support allows the studio to be used as a backend for Claude Desktop and other MCP-compatible clients

Dependencies

  • Core: psutil, requests
  • Inference: llama-cpp-python (with optional CUDA support)
  • GUI: gradio
  • MCP: fastmcp
  • Optional: ollama, transformers, peft (for LoRA)

https://claude.ai/code/session_01E8sDdaXo9Mry8phPH8KYgY

Complete modular Python package providing:
- Model management: GGUF model discovery, HuggingFace downloads, Ollama
  integration, quantization utilities (Q4_K_M/Q5_K_M), LoRA/QLoRA adapter
  management with training config generation
- Chat engine: multi-turn conversations with streaming, context window
  management, conversation persistence with tagging/categorization,
  export to JSON/Markdown
- Tool framework: sandboxed Python execution, filesystem operations with
  root restrictions, web/API fetching, SQLite/CSV database queries,
  git operations with safety controls, pip package management
- MCP server: full Model Context Protocol support exposing chat, tools,
  model management, and hardware info as MCP resources/tools/prompts
- Gradio web GUI: 6-tab interface (Chat, Models, Settings, Tools,
  Hardware Monitor, History) with dark theme, inference presets
  (code/research/creative/roleplay), system prompt templates,
  real-time VRAM monitoring, profile import/export
- CLI: subcommands for gui, mcp, chat, scan, info, config with
  full argument parsing and slash-commands in interactive mode
- Hardware detection: NVIDIA GPU via nvidia-smi, CPU/RAM monitoring,
  automatic recommendations for quantization, context length, batch
  size, and thread count based on available VRAM

Optimized for RTX 5060 (8GB VRAM) + Snapdragon X Plus hardware.
All components pip-installable with optional dependency groups.

https://claude.ai/code/session_01E8sDdaXo9Mry8phPH8KYgY
@chuckeelord chuckeelord marked this pull request as draft February 24, 2026 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants