Kaangml · Kaangml · Nov 27, 2025 · Nov 27, 2025 · Nov 27, 2025
diff --git a/.gitignore b/.gitignore
@@ -7,4 +7,3 @@ __pycache__/
 .vscode/
 playwright-report/
 test-results/
-docs/
diff --git a/README.md b/README.md
@@ -1,28 +1,166 @@
-# Autonomous Browser AI Agent (Bedrock)
+# 🤖 Autonomous Browser AI Agent
 
+An intelligent browser automation agent built with **Playwright** and a modular **agent-controller-browser** architecture. Plan tasks in natural language, execute them via browser actions, and collect results — all autonomously.
 
-## Proje yapısı (ASCII)
+## ✨ Features
 
-```text
+- **Browser Automation**: Full Playwright integration (goto, click, fill, extract text, screenshot, etc.)
+- **Safety Controls**: URL scheme filtering, loop detection, max-step limits
+- **Modular Architecture**: Agent → Controller → Browser layers for testability
+- **Human-like Behavior**: Configurable random delays to reduce bot detection
+- **CLI & API**: Run from command line or integrate into your Python code
+- **Extensible Planner**: Pluggable LLM interface for intelligent task planning
+
+## 🚀 Quick Start
+
+### Installation
+
+```bash
+# Clone the repository
+git clone https://github.com/Kaangml/autonomous_browser_ai_agent.git
+cd autonomous_browser_ai_agent
+
+# Install dependencies with uv (recommended)
+uv sync
+
+# Install Playwright browsers
+uv run playwright install chromium
+```
+
+### Run from CLI
+
+```bash
+# Extract text from a webpage
+uv run python -m src --url "https://example.com" --task "extract the page title"
+
+# Run with visible browser window
+uv run python -m src --url "https://example.com" --task "extract the page title" --no-headless
+
+# Output as JSON
+uv run python -m src --url "https://example.com" --task "read the main content" --json
+```
+
+### Run Examples
+
+```bash
+# Wikipedia example: extract featured article
+uv run python -m src.examples.example_wikipedia
+
+# DuckDuckGo search example
+uv run python -m src.examples.example_search
+```
+
+## 📖 Usage in Python
+
+```python
+import asyncio
+from browser.browser_config import BrowserConfigManager
+from browser.browser import BrowserManager
+from browser.actions import BrowserActions
+from controller.browser_controller import BrowserController
+
+async def main():
+    # Setup
+    config = BrowserConfigManager()
+    config.config.headless = True
+
+    browser = BrowserManager(config)
+    actions = BrowserActions(browser)
+    controller = BrowserController(actions)
+
+    try:
+        # Navigate
+        result = await controller.execute_action({
+            "type": "goto",
+            "args": {"url": "https://example.com"}
+        })
+        page = result["page"]
+
+        # Extract text
+        text = await controller.execute_action({
+            "type": "extract_text",
+            "args": {"page": page, "selector": "h1"}
+        })
+        print(text["result"])
+
+    finally:
+        await browser.close()
+
+asyncio.run(main())
+```
+
+## 🏗️ Architecture
+
+```
 src/
-├── agent/
-│   ├── agent.py
-│   ├── memory.py
-│   ├── planning.py
-│   ├── prompt_templates.py
-│   └── tools.py
-├── browser/
-│   ├── actions.py
-│   ├── browser_config.py
-│   ├── browser.py
-│   └── utils.py
-├── config/
-│   └── settings.py
-├── controller/
-│   ├── logger.py
-│   ├── task_manager.py
-│   └── workflow.py
-└── examples/
-    ├── example_login_automation.py
-    └── example_scrape_google.py
+├── agent/          # Task planning and reasoning
+│   ├── agent.py    # Main agent class (plan → execute → reflect)
+│   ├── planner.py  # LLM-based task decomposition
+│   └── memory.py   # Short-term memory
+├── browser/        # Playwright automation layer
+│   ├── browser.py  # Browser lifecycle management
+│   ├── actions.py  # High-level actions (click, fill, extract, etc.)
+│   └── utils.py    # Retry logic, human delays, URL normalization
+├── controller/     # Action orchestration
+│   └── browser_controller.py  # Maps agent actions to browser calls
+├── config/         # Configuration management
+└── examples/       # Working examples
 ```
+
+## 🧪 Testing
+
+```bash
+# Run all tests
+uv run pytest
+
+# Run with verbose output
+uv run pytest -v
+
+# Run specific test file
+uv run pytest tests/browser/test_actions.py
+```
+
+## 📋 Supported Actions
+
+| Action | Description | Args |
+|--------|-------------|------|
+| `goto` | Navigate to URL | `url` |
+| `click` | Click element | `page`, `selector` |
+| `fill` | Type into input | `page`, `selector`, `text` |
+| `extract_text` | Get element text | `page`, `selector` |
+| `links` | Get all links | `page`, `selector` (optional) |
+| `screenshot` | Capture page | `page`, `full_page` (optional) |
+
+## ⚙️ Configuration
+
+Browser behavior can be customized via `BrowserConfigManager`:
+
+```python
+config = BrowserConfigManager()
+config.config.headless = False          # Show browser window
+config.config.timeout = 30              # Timeout in seconds
+config.config.viewport_width = 1920     # Browser width
+config.config.viewport_height = 1080    # Browser height
+config.config.human_delay_min = 0.5     # Min delay between actions
+config.config.human_delay_max = 1.5     # Max delay between actions
+config.config.channel = "chrome"        # Use Chrome instead of Chromium
+```
+
+## 🗺️ Roadmap
+
+See [docs/ROADMAP.md](docs/ROADMAP.md) for the development roadmap.
+
+### Planned Features
+- [ ] Persistent memory (SQLite/vector DB)
+- [ ] Real LLM integration (OpenAI, Anthropic, Bedrock)
+- [ ] Job queue and workflow management
+- [ ] Retry policies with exponential backoff
+- [ ] Structured logging and metrics
+
+## 📄 License
+
+MIT
+
+## 🤝 Contributing
+
+Contributions welcome! Please read the roadmap first, then open a PR.
diff --git a/docs/DEV_NOTES.md b/docs/DEV_NOTES.md
@@ -0,0 +1,94 @@
+# agent.md — in-repo agent memory / plan
+
+This file is the agent's on-disk memory and work-plan to implement Faz 2 features. Keep this short and actionable — it will be updated as I complete tasks.
+
+## Current understanding (from `docs/faz1_2.md`)
+- Faz 2 focuses on completing the Browser engine, Controller, Agent reasoning and config improvements.
+- Browser (Playwright) needs robust actions: goto, click, type/fill, wait_for, extract_text, extract_all_links, screenshot, error handling/retries.
+- Controller must convert agent action JSON into browser calls and provide safety checks (URL filters, loop detection, max steps).
+- Agent needs multi-step reasoning (Plan → Execute → Reflect), tool-based reasoning, and self-correction.
+
+## Priority (what I'll implement first)
+1. Browser actions (Faz 2.1): make sure all core actions exist and are tested.
+2. Unit tests for browser actions (pytest style) — file-level tests under `tests/browser`.
+3. Controller small improvements / scaffolding (Faz 2.2) — map agent JSON to BrowserActions.
+4. Agent reasoning upgrade (Faz 2.3) — design plan + unit tests for planning flow.
+
+## First task breakdown (concrete)
+- Add `extract_all_links(page)` to `src/browser/actions.py`.
+- Add `screenshot(page)` to `src/browser/actions.py`.
+- Add tests for both functions in `tests/browser/test_actions.py`.
+
+After each file-level change:
+- Add/update a unit test in `tests/browser/` (pytest async tests where needed).
+- Run tests locally to ensure everything passes.
+
+## Testing rules (my memory)
+- Every implemented action must have a unit test that covers success and core happy path.
+- Use small Dummy classes in tests for `Page`, `BrowserManager` to avoid needing an actual browser.
+- Tests must be pytest compatible and located under `tests/browser`.
+
+## Notes / constraints
+- Keep changes small and incremental — test after each file.
+- Use `BrowserUtils.retry` for network-sensitive operations.
+
+## Next steps after tests
+- Expand controller mapping using a lightweight JSON action format and tests.
+- Then update agent logic to use multi-step reasoning and enable tools.
+
+## Recent updates (done)
+
+- Planner: `src/agent/planner.py` was added. The agent can now use an injected LLM planner (mockable) to create multi-step plans.
+- Memory: improved in-memory workflow; tests added for planner and agent execution.
+- Playwright configs: `BrowserConfig` now supports `channel`, and `human_delay_min/max` to add human-like timing jitter for automation to reduce bot detection.
+
+## Faz 2 — status summary ✅
+
+All Faz 2 goals are implemented and tested locally:
+
+- Browser actions: goto, click, fill, wait_for, extract_text, extract_all_links, screenshot, and helper utilities (normalize_url, human_delay, retry).  
+- Controller: `BrowserController` with execute_action/execute_sequence, safety checks (URL scheme filter, max steps, loop detection).  
+- Agent: simple LLMPlanner interface and async planning pipeline; agent executes steps with injected planner + short-term memory skeleton.  
+- Tests: comprehensive unit tests added across browser, controller and agent layers; e2e Playwright tests included (fixture based), with an optional headful test guarded by RUN_HEADFUL for local debugging.
+
+Local test result (most recent run): 29 passed, 1 skipped.
+
+Notes:
+- Playwright headful tests can be flaky against public sites (Google blocks automated clients); use local fixtures or less-restrictive targets in CI.  
+- Changes are committed locally in logical groups (agent, browser, controller, e2e, ci), but pushing to remote failed due to repository auth; you'll need to run `git push` with the correct credentials or switch accounts if you want me to push from here.
+
+---
+
+# Faz 3 — next phase (high level goals) 🔭
+
+Faz 3 upgrades will move the project from a local POC into a more production-ready, resilient agent.
+
+Planned work (concrete, test-driven)
+
+1) Persistent/smart memory (short + long term) 💾
+	- Add a persistent store (SQLite) for short-term memory and a pluggable vector DB adapter for longer-term semantic memory (optional: FAISS / Milvus / SQLite+annlite).  
+	- Write migrations, test dataset fixtures, and unit tests for reads/writes and eviction policies.  
+	- Optionally add minimal embedding + retrieval using a mock embedding provider to allow offline tests.
+
+2) LLM planning contract and adapter 🧭
+	- Implement a strict planner contract requiring structured JSON output (schema) with steps and tool names to reduce hallucinations.  
+	- Add an LLM adapter interface to swap providers (mock tests for CI + sample provider configs).  
+	- Add unit tests that assert the planner always returns a valid plan or a clear, recoverable error response.
+
+3) Controller hardening & workflow engine 🔧
+	- Add a persistent job queue + state machine for long-running workflows (states: pending, running, succeeded, failed, retrying).  
+	- Implement retry/backoff policies (max attempts, exponential backoff) and durable input/output logs for observability.  
+	- Add safe scheduling and re-entrancy guarantees (so controllers can resume from failure).  
+
+4) Monitoring, instrumentation & CI ✅
+	- Add metrics and structured logs for each step (execution time, step result, errors) and unit tests to verify logging of key events.  
+	- Harden CI to include unit + e2e fixture tests, and avoid unreliable public-site headful tests in CI.
+
+5) Release & collaboration tasks
+	- Push local branch to remote (requires correct auth). If you'd like I can attempt the push again after you switch accounts or provide push access.  
+	- Open a PR from the feature branch with Faz 2 changes and Faz 3 follow-up work split into smaller PRs.
+
+If you'd like, I can start with the memory persistence work (SQLite + tests) next.
+
+
+(Agent memory file — update this as steps complete.)