Skip to content

Kaangml/browseruse_ai_agent

Repository files navigation

🤖 BrowserUse AI Agent API

A production-ready FastAPI service for AI-powered web automation using browser-use with multiple LLM providers

Python 3.11+ FastAPI License: MIT

BrowserUse AI Agent API is a professional-grade service that combines browser automation with Large Language Models (LLMs) to execute complex web tasks. Built with enterprise features like structured logging, configurable templates, and multiple LLM provider support, it's designed to be both powerful and easy to customize.

✨ Features

  • 🔌 Multiple LLM Providers: Google (Gemini), OpenAI, Anthropic, Ollama
  • 📝 Template System: Jinja2-based task templates for flexible task definition
  • 📊 Professional Logging: JSON/text structured logging with rotation and multiple handlers
  • ⚙️ YAML Configuration: Centralized, validated configuration management
  • 🐳 Docker Ready: Production-ready Docker setup with multi-stage builds
  • 🔄 RESTful API: Clean, documented API with Pydantic validation
  • 🛠️ Extensible: Easy to add custom data processors and utilities
  • 📈 Monitoring: Health checks and metrics endpoints
  • 🎯 Type Safe: Full type hints and Pydantic models throughout

🚀 Quick Start

Prerequisites

  • Python 3.11 or higher
  • An API key for your chosen LLM provider (Gemini, OpenAI, Anthropic, or Ollama)

Installation

  1. Clone the repository
git clone https://github.com/yourusername/browseruse-ai-api.git
cd browseruse-ai-api
  1. Install dependencies
pip install -r requirements.txt
  1. Configure the service
# Copy example configuration
cp config/config.yaml config/config.yaml

# Copy example environment file
cp .env.example .env

# Edit .env and add your API key
nano .env
  1. Start the service
python run.py

The service will be available at http://localhost:8000. Visit http://localhost:8000/docs for the interactive API documentation.

🐳 Docker Deployment

Using Docker Compose (Recommended)

# Set your API key in .env file
echo "GEMINI_API_KEY=your_api_key_here" > .env

# Start the service
docker-compose up -d

# View logs
docker-compose logs -f

# Stop the service
docker-compose down

Using Docker directly

# Build the image
docker build -t browseruse-ai-api .

# Run the container
docker run -d \
  -p 8000:8000 \
  -e GEMINI_API_KEY=your_api_key \
  -v $(pwd)/logs:/app/logs \
  --name browseruse-ai-api \
  browseruse-ai-api

📖 Usage

Basic Task Execution

Send a POST request to /task endpoint:

curl -X POST "http://localhost:8000/task" \
  -H "Content-Type: application/json" \
  -d '{
    "template_name": "default",
    "task_data": {
      "search_location": "New York, USA",
      "primary_name": "Empire State Building",
      "address": "350 5th Ave",
      "city": "New York",
      "district": "Manhattan"
    }
  }'

Python Example

import requests

response = requests.post(
    "http://localhost:8000/task",
    json={
        "template_name": "default",
        "task_data": {
            "search_location": "Paris, France",
            "primary_name": "Eiffel Tower",
            "city": "Paris"
        }
    }
)

result = response.json()
print(f"Success: {result['success']}")
print(f"Result: {result['result']}")

API Endpoints

Endpoint Method Description
/ GET Service information
/health GET Health check
/config GET Current configuration
/templates GET List available templates
/task POST Execute a task
/docs GET Interactive API documentation

⚙️ Configuration

Main Configuration File

Edit config/config.yaml to customize the service:

# LLM Configuration
llm:
  provider: "google"  # google, openai, anthropic, ollama
  model_name: "gemini-2.0-flash"
  temperature: 0.7
  max_tokens: 4096

# Logging Configuration
logging:
  level: "INFO"  # DEBUG, INFO, WARNING, ERROR, CRITICAL
  format: "json"  # json, text
  console:
    enabled: true
    level: "INFO"
  file:
    enabled: true
    level: "DEBUG"
    path: "logs/app.log"
    max_bytes: 10  # MB
    backup_count: 5

# Task Configuration
tasks:
  default_template: "default"
  templates_dir: "templates"
  timeout: 300
  max_retries: 3

Environment Variables

Create a .env file in the project root:

# Google Gemini
GEMINI_API_KEY=your_gemini_api_key_here

# Or OpenAI
OPENAI_API_KEY=your_openai_api_key_here

# Or Anthropic
ANTHROPIC_API_KEY=your_anthropic_api_key_here

Switching LLM Providers

Google Gemini (Default)

llm:
  provider: "google"
  model_name: "gemini-2.0-flash"
  api_key_env: "GEMINI_API_KEY"

OpenAI

llm:
  provider: "openai"
  model_name: "gpt-4"
  api_key_env: "OPENAI_API_KEY"

Anthropic Claude

llm:
  provider: "anthropic"
  model_name: "claude-3-opus"
  api_key_env: "ANTHROPIC_API_KEY"

Ollama (Local)

llm:
  provider: "ollama"
  model_name: "llama2"
  # No API key needed for local Ollama

📝 Creating Custom Templates

Templates are stored in the templates/ directory and use Jinja2 syntax.

Example: Custom Search Template

Create templates/web_search.txt:

Search the web for information about {{ topic }}.

Steps:
1. Go to {{ search_engine | default("https://www.google.com") }}
2. Search for: "{{ topic }}"
{% if specific_site %}
3. Focus on results from {{ specific_site }}
{% endif %}
4. Extract the top {{ num_results | default(5) }} results

Return the results as a JSON array with title, URL, and snippet for each result.

Using the template:

response = requests.post(
    "http://localhost:8000/task",
    json={
        "template_name": "web_search",
        "task_data": {
            "topic": "climate change solutions",
            "search_engine": "https://www.google.com",
            "num_results": 10
        }
    }
)

🛠️ Custom Data Processing

You can add custom data processors in src/utils/data_utils.py:

from src.utils import register_processor

@register_processor("my_custom_processor")
def process_my_data(data: dict) -> dict:
    """Custom data processing logic."""
    # Your processing logic here
    processed = {
        "original": data,
        "processed": data.get("field", "").upper()
    }
    return processed

Then reference it in your configuration:

data_processing:
  processors:
    - "my_custom_processor"

📊 Logging

BrowserUse AI API provides comprehensive logging:

Console Logs

  • Colored output for easy reading
  • Configurable log level
  • Real-time request/response logging

File Logs

  • JSON structured logs for easy parsing
  • Automatic rotation when size limit reached
  • Configurable retention (backup count)

Conversation Logs

  • Full agent conversation history
  • Saved per task execution
  • Located in logs/conversations/

Log Levels

Adjust in config/config.yaml:

logging:
  level: "DEBUG"  # See everything
  # level: "INFO"   # Production default
  # level: "WARNING"  # Only warnings and errors
  # level: "ERROR"    # Only errors

🧪 Testing

Run the example request:

# Test health endpoint
curl http://localhost:8000/health

# List available templates
curl http://localhost:8000/templates

# Test task execution
curl -X POST http://localhost:8000/task \
  -H "Content-Type: application/json" \
  -d @example_request.json

📁 Project Structure

browseruse-ai-api/
├── config/
│   └── config.yaml           # Main configuration file
├── src/
│   ├── api/
│   │   ├── __init__.py
│   │   └── models.py         # Pydantic models for API
│   ├── core/
│   │   ├── __init__.py
│   │   ├── config.py         # Configuration management
│   │   └── logging.py        # Logging setup
│   ├── services/
│   │   ├── __init__.py
│   │   ├── agent_service.py  # Agent execution logic
│   │   ├── llm_service.py    # LLM provider management
│   │   └── template_service.py # Template rendering
│   ├── utils/
│   │   ├── __init__.py
│   │   └── data_utils.py     # Utility functions (customize here!)
│   └── app.py                # FastAPI application
├── templates/
│   ├── default.txt           # Default task template
│   └── google_maps_en.txt    # Example: Google Maps search
├── logs/                     # Log files (auto-created)
├── tests/                    # Test files
├── .env                      # Environment variables (create this)
├── .env.example              # Environment variables example
├── .gitignore
├── docker-compose.yaml       # Docker Compose configuration
├── Dockerfile                # Docker image definition
├── requirements.txt          # Python dependencies
├── run.py                    # Application entry point
└── README.md                 # This file

🔒 Security Considerations

  • ✅ Never commit .env file or API keys to version control
  • ✅ Use environment variables for sensitive data
  • ✅ Run Docker containers as non-root user (configured)
  • ✅ Enable CORS only for trusted origins in production
  • ✅ Use HTTPS in production deployments
  • ✅ Regularly update dependencies

🤝 Contributing

Contributions are welcome! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Please see CONTRIBUTING.md for detailed guidelines.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

️ Roadmap

  • Add more LLM provider support
  • Implement rate limiting
  • Add authentication/authorization
  • Create web UI for task management
  • Add task queue for async processing
  • Implement caching for repeated tasks
  • Add metrics and monitoring dashboard

BrowserUse AI Agent API - AI-Powered Web Automation Service

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published