Unfinished Project. See Roadmap
Transform your webpage into an AI-powered application with a single script tag.
An in-page UI agent in javascript. Control web interfaces with natural language.
🌐 English | 中文
- 🎯 Easy Integration
- 🔐 Client-Side Processing
- 🧠 DOM Extraction
- 💬 Natural Language Interface
- 🎨 UI with Human in the loop
👉 Roadmap
TODO: CDN endpoint to be determined.
<!-- CDN script tag - URL to be updated -->
<script src="TODO-CDN-URL"></script>
npm install page-agent
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
modelName: 'gpt-4.1-mini',
baseURL: 'xxxx',
apiKey: 'xxxx'
})
await agent.execute("Click the login button")
PageAgent follows a clean, modular architecture:
src/
├── PageAgent.ts # Agent main loop
├── dom/ # DOM processing
├── tools/ # Agent tools
├── ui/ # UI components & panels
├── llms/ # LLM integration layer
└── utils/ # Event bus & utilities
We welcome contributions from the community! Here's how to get started:
- Fork the repository
- Clone your fork:
git clone https://github.com/alibaba/page-agent.git && cd page-agent
- Install dependencies:
npm install
- Start development:
npm start
Please read our Code of Conduct and Contributing Guide before contributing.
This project builds upon the excellent work of:
PageAgent is designed for client-side web enhancement, not server-side automation.
MIT License - see the LICENSE file for details.
DOM processing components and prompt are derived from browser-use (MIT License). See NOTICE for full attribution.
⭐ Star this repo if you find PageAgent helpful!