🚀 Multimodal Coding Agent with Gemini

A powerful multimodal coding assistant that can analyze images containing code problems and generate solutions in multiple programming languages. Built with Google's Gemini AI and Gradio for an intuitive web interface.

✨ Features

🔍 Multimodal Analysis

Image Processing: Upload screenshots of coding problems or code snippets
Text Input: Describe coding problems in natural language
Smart Recognition: Automatically detects problem types and requirements

💻 Multi-Language Support

Python: Full execution with safety restrictions
JavaScript/TypeScript: Modern ES6+ and TypeScript support
Java: Complete compilation and execution
C/C++: GCC/G++ compiler integration
HTML/CSS: Syntax validation and web project generation
TSX: React TypeScript components

🌐 Web Project Generation

Complete Websites: Generate full HTML, CSS, and JavaScript projects
Responsive Design: Mobile-friendly, modern web standards
Auto-Save: Automatically organize and save multi-file projects
Ready-to-Deploy: Generated projects work out of the box

🛡️ Safe Code Execution

Sandboxed Environment: Restricted execution for security
Module Filtering: Only allows safe standard library modules
Timeout Protection: Prevents infinite loops and hanging processes
Error Handling: Comprehensive error reporting and suggestions

🚀 Quick Start

Prerequisites

Python 3.8 or higher
Google Gemini API key
(Optional) Compilers for specific languages:
- Node.js for JavaScript/TypeScript
- JDK for Java
- GCC/G++ for C/C++

Installation

Clone the repository
```
git clone <your-repo-url>
cd yehh
```
Install dependencies
```
pip install -r requirements.txt
```
Get your Gemini API key
- Visit Google AI Studio
- Create a new API key
- Keep it ready for the application
Run the application
```
python app.py
```
Open your browser
- Navigate to the URL shown in the terminal (typically http://localhost:7860)
- Enter your Gemini API key
- Start coding!

💡 Usage Examples

📸 Image Analysis

1. Upload an image containing a coding problem
2. Select your target programming language
3. Click "Generate & Execute Solution"
4. Get code generation + execution results

📝 Text Prompts

- "Write a function to reverse a string in Python"
- "Create a responsive login form using HTML, CSS, and JS"
- "Build a calculator class in Java"
- "Make a MVP landing page"

🌐 Web Projects

Input: "Create a modern portfolio website"
Output: Complete project with:
├── index.html (Semantic HTML5)
├── styles.css (Responsive CSS)
└── script.js (Interactive JavaScript)

🔧 Advanced Features

Multi-File Project Keywords

Trigger automatic web project generation with keywords:

landing page, website, web app
MVP, portfolio, dashboard
html css js, full website

Language-Specific Execution

Python: Executes safely with restricted imports
JavaScript: Requires Node.js installation
Java: Compiles and runs with JDK
C/C++: Compiles with GCC/G++ and executes
HTML/CSS: Validates syntax and structure

Error Handling

API Errors: Graceful handling of service unavailability
Compilation Errors: Clear error messages and fix suggestions
Runtime Errors: Detailed execution analysis and debugging help

🏗️ Project Structure

yehh/
├── app.py                 # Main application file
├── requirements.txt       # Python dependencies
├── README.md             # Project documentation
├── LICENSE               # MIT License
├── .gitignore           # Git ignore patterns
├── image.png            # Sample image for testing
└── generated_project/   # Auto-generated web projects
    ├── index.html
    ├── styles.css
    └── script.js

🔒 Security Features

Restricted Execution: Limited to safe operations only
Module Filtering: Prevents dangerous imports
Timeout Controls: Automatic termination of long-running code
File System Protection: No unauthorized file access

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

🐛 Troubleshooting

Common Issues

503 Service Unavailable Error

Gemini API is temporarily down
Wait a few minutes and retry
Use text input instead of image upload

Import/Module Errors

Run pip install -r requirements.txt to install all dependencies
If you get google-genai errors, run: pip install google-genai
For agno installation issues, try: pip install agno --upgrade

Execution Errors

Ensure required compilers are installed
Check language-specific prerequisites
Review generated code for syntax errors

API Key Issues

Verify your Gemini API key is valid
Check API quotas and limits
Ensure vision API access is enabled

📋 Roadmap

Support for more programming languages (Go, Rust, etc.)
Advanced code optimization suggestions
Integration with popular IDEs
Collaborative coding features
Custom execution environments
API endpoint for programmatic access

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Google Gemini: For powerful multimodal AI capabilities
Gradio: For the intuitive web interface
Agno: For seamless AI agent integration
Community: For testing and feedback

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: [email protected]

**Made with ❤️ by hari7261

Transform your coding workflow with AI-powered multimodal assistance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Multimodal Coding Agent with Gemini

✨ Features

🔍 Multimodal Analysis

💻 Multi-Language Support

🌐 Web Project Generation

🛡️ Safe Code Execution

🚀 Quick Start

Prerequisites

Installation

💡 Usage Examples

📸 Image Analysis

📝 Text Prompts

🌐 Web Projects

🔧 Advanced Features

Multi-File Project Keywords

Language-Specific Execution

Error Handling

🏗️ Project Structure

🔒 Security Features

🤝 Contributing

🐛 Troubleshooting

Common Issues

📋 Roadmap

📜 License

🙏 Acknowledgments

📞 Support

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
app.py		app.py
image.png		image.png
requirements.txt		requirements.txt

License

hari7261/MultimodalCodingAgent-AI

Folders and files

Latest commit

History

Repository files navigation

🚀 Multimodal Coding Agent with Gemini

✨ Features

🔍 Multimodal Analysis

💻 Multi-Language Support

🌐 Web Project Generation

🛡️ Safe Code Execution

🚀 Quick Start

Prerequisites

Installation

💡 Usage Examples

📸 Image Analysis

📝 Text Prompts

🌐 Web Projects

🔧 Advanced Features

Multi-File Project Keywords

Language-Specific Execution

Error Handling

🏗️ Project Structure

🔒 Security Features

🤝 Contributing

🐛 Troubleshooting

Common Issues

📋 Roadmap

📜 License

🙏 Acknowledgments

📞 Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages