🤖 BrowserUse AI Agent API

A production-ready FastAPI service for AI-powered web automation using browser-use with multiple LLM providers

BrowserUse AI Agent API is a professional-grade service that combines browser automation with Large Language Models (LLMs) to execute complex web tasks. Built with enterprise features like structured logging, configurable templates, and multiple LLM provider support, it's designed to be both powerful and easy to customize.

✨ Features

🔌 Multiple LLM Providers: Google (Gemini), OpenAI, Anthropic, Ollama
📝 Template System: Jinja2-based task templates for flexible task definition
📊 Professional Logging: JSON/text structured logging with rotation and multiple handlers
⚙️ YAML Configuration: Centralized, validated configuration management
🐳 Docker Ready: Production-ready Docker setup with multi-stage builds
🔄 RESTful API: Clean, documented API with Pydantic validation
🛠️ Extensible: Easy to add custom data processors and utilities
📈 Monitoring: Health checks and metrics endpoints
🎯 Type Safe: Full type hints and Pydantic models throughout

🚀 Quick Start

Prerequisites

Python 3.11 or higher
An API key for your chosen LLM provider (Gemini, OpenAI, Anthropic, or Ollama)

Installation

Clone the repository

git clone https://github.com/yourusername/browseruse-ai-api.git
cd browseruse-ai-api

Install dependencies

pip install -r requirements.txt

Configure the service

# Copy example configuration
cp config/config.yaml config/config.yaml

# Copy example environment file
cp .env.example .env

# Edit .env and add your API key
nano .env

Start the service

python run.py

The service will be available at http://localhost:8000. Visit http://localhost:8000/docs for the interactive API documentation.

🐳 Docker Deployment

Using Docker Compose (Recommended)

# Set your API key in .env file
echo "GEMINI_API_KEY=your_api_key_here" > .env

# Start the service
docker-compose up -d

# View logs
docker-compose logs -f

# Stop the service
docker-compose down

Using Docker directly

# Build the image
docker build -t browseruse-ai-api .

# Run the container
docker run -d \
  -p 8000:8000 \
  -e GEMINI_API_KEY=your_api_key \
  -v $(pwd)/logs:/app/logs \
  --name browseruse-ai-api \
  browseruse-ai-api

📖 Usage

Basic Task Execution

Send a POST request to /task endpoint:

curl -X POST "http://localhost:8000/task" \
  -H "Content-Type: application/json" \
  -d '{
    "template_name": "default",
    "task_data": {
      "search_location": "New York, USA",
      "primary_name": "Empire State Building",
      "address": "350 5th Ave",
      "city": "New York",
      "district": "Manhattan"
    }
  }'

Python Example

import requests

response = requests.post(
    "http://localhost:8000/task",
    json={
        "template_name": "default",
        "task_data": {
            "search_location": "Paris, France",
            "primary_name": "Eiffel Tower",
            "city": "Paris"
        }
    }
)

result = response.json()
print(f"Success: {result['success']}")
print(f"Result: {result['result']}")

API Endpoints

Endpoint	Method	Description
`/`	GET	Service information
`/health`	GET	Health check
`/config`	GET	Current configuration
`/templates`	GET	List available templates
`/task`	POST	Execute a task
`/docs`	GET	Interactive API documentation

⚙️ Configuration

Main Configuration File

Edit config/config.yaml to customize the service:

# LLM Configuration
llm:
  provider: "google"  # google, openai, anthropic, ollama
  model_name: "gemini-2.0-flash"
  temperature: 0.7
  max_tokens: 4096

# Logging Configuration
logging:
  level: "INFO"  # DEBUG, INFO, WARNING, ERROR, CRITICAL
  format: "json"  # json, text
  console:
    enabled: true
    level: "INFO"
  file:
    enabled: true
    level: "DEBUG"
    path: "logs/app.log"
    max_bytes: 10  # MB
    backup_count: 5

# Task Configuration
tasks:
  default_template: "default"
  templates_dir: "templates"
  timeout: 300
  max_retries: 3

Environment Variables

Create a .env file in the project root:

# Google Gemini
GEMINI_API_KEY=your_gemini_api_key_here

# Or OpenAI
OPENAI_API_KEY=your_openai_api_key_here

# Or Anthropic
ANTHROPIC_API_KEY=your_anthropic_api_key_here

Switching LLM Providers

Google Gemini (Default)

llm:
  provider: "google"
  model_name: "gemini-2.0-flash"
  api_key_env: "GEMINI_API_KEY"

OpenAI

llm:
  provider: "openai"
  model_name: "gpt-4"
  api_key_env: "OPENAI_API_KEY"

Anthropic Claude

llm:
  provider: "anthropic"
  model_name: "claude-3-opus"
  api_key_env: "ANTHROPIC_API_KEY"

Ollama (Local)

llm:
  provider: "ollama"
  model_name: "llama2"
  # No API key needed for local Ollama

📝 Creating Custom Templates

Templates are stored in the templates/ directory and use Jinja2 syntax.

Example: Custom Search Template

Create templates/web_search.txt:

Search the web for information about {{ topic }}.

Steps:
1. Go to {{ search_engine | default("https://www.google.com") }}
2. Search for: "{{ topic }}"
{% if specific_site %}
3. Focus on results from {{ specific_site }}
{% endif %}
4. Extract the top {{ num_results | default(5) }} results

Return the results as a JSON array with title, URL, and snippet for each result.

Using the template:

response = requests.post(
    "http://localhost:8000/task",
    json={
        "template_name": "web_search",
        "task_data": {
            "topic": "climate change solutions",
            "search_engine": "https://www.google.com",
            "num_results": 10
        }
    }
)

🛠️ Custom Data Processing

You can add custom data processors in src/utils/data_utils.py:

from src.utils import register_processor

@register_processor("my_custom_processor")
def process_my_data(data: dict) -> dict:
    """Custom data processing logic."""
    # Your processing logic here
    processed = {
        "original": data,
        "processed": data.get("field", "").upper()
    }
    return processed

Then reference it in your configuration:

data_processing:
  processors:
    - "my_custom_processor"

📊 Logging

BrowserUse AI API provides comprehensive logging:

Console Logs

Colored output for easy reading
Configurable log level
Real-time request/response logging

File Logs

JSON structured logs for easy parsing
Automatic rotation when size limit reached
Configurable retention (backup count)

Conversation Logs

Full agent conversation history
Saved per task execution
Located in logs/conversations/

Log Levels

Adjust in config/config.yaml:

logging:
  level: "DEBUG"  # See everything
  # level: "INFO"   # Production default
  # level: "WARNING"  # Only warnings and errors
  # level: "ERROR"    # Only errors

🧪 Testing

Run the example request:

# Test health endpoint
curl http://localhost:8000/health

# List available templates
curl http://localhost:8000/templates

# Test task execution
curl -X POST http://localhost:8000/task \
  -H "Content-Type: application/json" \
  -d @example_request.json

📁 Project Structure

browseruse-ai-api/
├── config/
│   └── config.yaml           # Main configuration file
├── src/
│   ├── api/
│   │   ├── __init__.py
│   │   └── models.py         # Pydantic models for API
│   ├── core/
│   │   ├── __init__.py
│   │   ├── config.py         # Configuration management
│   │   └── logging.py        # Logging setup
│   ├── services/
│   │   ├── __init__.py
│   │   ├── agent_service.py  # Agent execution logic
│   │   ├── llm_service.py    # LLM provider management
│   │   └── template_service.py # Template rendering
│   ├── utils/
│   │   ├── __init__.py
│   │   └── data_utils.py     # Utility functions (customize here!)
│   └── app.py                # FastAPI application
├── templates/
│   ├── default.txt           # Default task template
│   └── google_maps_en.txt    # Example: Google Maps search
├── logs/                     # Log files (auto-created)
├── tests/                    # Test files
├── .env                      # Environment variables (create this)
├── .env.example              # Environment variables example
├── .gitignore
├── docker-compose.yaml       # Docker Compose configuration
├── Dockerfile                # Docker image definition
├── requirements.txt          # Python dependencies
├── run.py                    # Application entry point
└── README.md                 # This file

🔒 Security Considerations

✅ Never commit .env file or API keys to version control
✅ Use environment variables for sensitive data
✅ Run Docker containers as non-root user (configured)
✅ Enable CORS only for trusted origins in production
✅ Use HTTPS in production deployments
✅ Regularly update dependencies

🤝 Contributing

Contributions are welcome! Here's how you can help:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Please see CONTRIBUTING.md for detailed guidelines.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with FastAPI
Powered by browser-use
Templating by Jinja2
Configuration management with Pydantic

️ Roadmap

Add more LLM provider support
Implement rate limiting
Add authentication/authorization
Create web UI for task management
Add task queue for async processing
Implement caching for repeated tasks
Add metrics and monitoring dashboard

BrowserUse AI Agent API - AI-Powered Web Automation Service

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 BrowserUse AI Agent API

✨ Features

🚀 Quick Start

Prerequisites

Installation

🐳 Docker Deployment

Using Docker Compose (Recommended)

Using Docker directly

📖 Usage

Basic Task Execution

Python Example

API Endpoints

⚙️ Configuration

Main Configuration File

Environment Variables

Switching LLM Providers

📝 Creating Custom Templates

🛠️ Custom Data Processing

📊 Logging

Console Logs

File Logs

Conversation Logs

Log Levels

🧪 Testing

📁 Project Structure

🔒 Security Considerations

🤝 Contributing

📄 License

🙏 Acknowledgments

️ Roadmap

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
docs		docs
examples		examples
src		src
templates		templates
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
GITHUB_RELEASE_CHECKLIST.md		GITHUB_RELEASE_CHECKLIST.md
LICENSE		LICENSE
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
docker-compose.yml		docker-compose.yml
example_request.json		example_request.json
requirements.txt		requirements.txt
run.py		run.py
test_real_task.py		test_real_task.py

License

Kaangml/browseruse_ai_agent

Folders and files

Latest commit

History

Repository files navigation

🤖 BrowserUse AI Agent API

✨ Features

🚀 Quick Start

Prerequisites

Installation

🐳 Docker Deployment

Using Docker Compose (Recommended)

Using Docker directly

📖 Usage

Basic Task Execution

Python Example

API Endpoints

⚙️ Configuration

Main Configuration File

Environment Variables

Switching LLM Providers

📝 Creating Custom Templates

🛠️ Custom Data Processing

📊 Logging

Console Logs

File Logs

Conversation Logs

Log Levels

🧪 Testing

📁 Project Structure

🔒 Security Considerations

🤝 Contributing

📄 License

🙏 Acknowledgments

️ Roadmap

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages