Skip to content

hari7261/MultimodalCodingAgent-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Multimodal Coding Agent with Gemini

A powerful multimodal coding assistant that can analyze images containing code problems and generate solutions in multiple programming languages. Built with Google's Gemini AI and Gradio for an intuitive web interface.

image

Python License Gradio AI

✨ Features

πŸ” Multimodal Analysis

  • Image Processing: Upload screenshots of coding problems or code snippets
  • Text Input: Describe coding problems in natural language
  • Smart Recognition: Automatically detects problem types and requirements

πŸ’» Multi-Language Support

  • Python: Full execution with safety restrictions
  • JavaScript/TypeScript: Modern ES6+ and TypeScript support
  • Java: Complete compilation and execution
  • C/C++: GCC/G++ compiler integration
  • HTML/CSS: Syntax validation and web project generation
  • TSX: React TypeScript components

🌐 Web Project Generation

  • Complete Websites: Generate full HTML, CSS, and JavaScript projects
  • Responsive Design: Mobile-friendly, modern web standards
  • Auto-Save: Automatically organize and save multi-file projects
  • Ready-to-Deploy: Generated projects work out of the box

πŸ›‘οΈ Safe Code Execution

  • Sandboxed Environment: Restricted execution for security
  • Module Filtering: Only allows safe standard library modules
  • Timeout Protection: Prevents infinite loops and hanging processes
  • Error Handling: Comprehensive error reporting and suggestions

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • Google Gemini API key
  • (Optional) Compilers for specific languages:
    • Node.js for JavaScript/TypeScript
    • JDK for Java
    • GCC/G++ for C/C++

Installation

  1. Clone the repository

    git clone <your-repo-url>
    cd yehh
  2. Install dependencies

    pip install -r requirements.txt
  3. Get your Gemini API key

  4. Run the application

    python app.py
  5. Open your browser

    • Navigate to the URL shown in the terminal (typically http://localhost:7860)
    • Enter your Gemini API key
    • Start coding!

πŸ’‘ Usage Examples

πŸ“Έ Image Analysis

1. Upload an image containing a coding problem
2. Select your target programming language
3. Click "Generate & Execute Solution"
4. Get code generation + execution results

πŸ“ Text Prompts

- "Write a function to reverse a string in Python"
- "Create a responsive login form using HTML, CSS, and JS"
- "Build a calculator class in Java"
- "Make a MVP landing page"

🌐 Web Projects

Input: "Create a modern portfolio website"
Output: Complete project with:
β”œβ”€β”€ index.html (Semantic HTML5)
β”œβ”€β”€ styles.css (Responsive CSS)
└── script.js (Interactive JavaScript)

πŸ”§ Advanced Features

Multi-File Project Keywords

Trigger automatic web project generation with keywords:

  • landing page, website, web app
  • MVP, portfolio, dashboard
  • html css js, full website

Language-Specific Execution

  • Python: Executes safely with restricted imports
  • JavaScript: Requires Node.js installation
  • Java: Compiles and runs with JDK
  • C/C++: Compiles with GCC/G++ and executes
  • HTML/CSS: Validates syntax and structure

Error Handling

  • API Errors: Graceful handling of service unavailability
  • Compilation Errors: Clear error messages and fix suggestions
  • Runtime Errors: Detailed execution analysis and debugging help

πŸ—οΈ Project Structure

yehh/
β”œβ”€β”€ app.py                 # Main application file
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md             # Project documentation
β”œβ”€β”€ LICENSE               # MIT License
β”œβ”€β”€ .gitignore           # Git ignore patterns
β”œβ”€β”€ image.png            # Sample image for testing
└── generated_project/   # Auto-generated web projects
    β”œβ”€β”€ index.html
    β”œβ”€β”€ styles.css
    └── script.js

πŸ”’ Security Features

  • Restricted Execution: Limited to safe operations only
  • Module Filtering: Prevents dangerous imports
  • Timeout Controls: Automatic termination of long-running code
  • File System Protection: No unauthorized file access

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ› Troubleshooting

Common Issues

503 Service Unavailable Error

  • Gemini API is temporarily down
  • Wait a few minutes and retry
  • Use text input instead of image upload

Import/Module Errors

  • Run pip install -r requirements.txt to install all dependencies
  • If you get google-genai errors, run: pip install google-genai
  • For agno installation issues, try: pip install agno --upgrade

Execution Errors

  • Ensure required compilers are installed
  • Check language-specific prerequisites
  • Review generated code for syntax errors

API Key Issues

  • Verify your Gemini API key is valid
  • Check API quotas and limits
  • Ensure vision API access is enabled

πŸ“‹ Roadmap

  • Support for more programming languages (Go, Rust, etc.)
  • Advanced code optimization suggestions
  • Integration with popular IDEs
  • Collaborative coding features
  • Custom execution environments
  • API endpoint for programmatic access

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Google Gemini: For powerful multimodal AI capabilities
  • Gradio: For the intuitive web interface
  • Agno: For seamless AI agent integration
  • Community: For testing and feedback

πŸ“ž Support


**Made with ❀️ by hari7261

Transform your coding workflow with AI-powered multimodal assistance!

About

multimodal coding assistant that can analyze images containing code problems and generate solutions in multiple programming languages.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages