Skip to content

Commit

Permalink
Merge pull request #9 from reworkd/ts
Browse files Browse the repository at this point in the history
* πŸŽ‰ Convert mls utils to use typescript
* πŸ†Ž Make spans have red background
* βœ… Fix typing
* πŸ’» Improve readme for local dev
* πŸ’» Improve readme for local dev
  • Loading branch information
awtkns committed Nov 15, 2023
2 parents bcb3be0 + 42f9468 commit ae5a749
Show file tree
Hide file tree
Showing 19 changed files with 434 additions and 228 deletions.
22 changes: 11 additions & 11 deletions .github/CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,23 +17,23 @@ diverse, inclusive, and healthy community.
Examples of behavior that contributes to a positive environment for our
community include:

* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
- Demonstrating empathy and kindness toward other people
- Being respectful of differing opinions, viewpoints, and experiences
- Giving and gracefully accepting constructive feedback
- Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
- Focusing on what is best not just for us as individuals, but for the
overall community

Examples of unacceptable behavior include:

* The use of sexualized language or imagery, and sexual attention or
- The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
- Trolling, insulting or derogatory comments, and personal or political attacks
- Public or private harassment
- Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
- Other conduct which could reasonably be considered inappropriate in a
professional setting

## Enforcement Responsibilities
Expand Down Expand Up @@ -106,7 +106,7 @@ Violating these terms may lead to a permanent ban.
### 4. Permanent Ban

**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.

**Consequence**: A permanent ban from any sort of public interaction within
Expand Down
3 changes: 1 addition & 2 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ We welcome ideas for improvements and new features. To suggest an enhancement, o

### Code Style

AgentGPT uses [ESLint](https://eslint.org/) as its code style guide. Please ensure that your code follows these guidelines.
AgentGPT uses [ESLint](https://eslint.org/) as its code style guide. Please ensure that your code follows these guidelines.

### Commit Messages

Expand All @@ -61,4 +61,3 @@ Write clear and concise commit messages that briefly describe the changes made i
- [ESLint Style Guide](https://eslint.org/)

Thank you once again for your interest in contributing to AgentGPT. We look forward to collaborating with you and creating an even better project together!

2 changes: 1 addition & 1 deletion .github/SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ Due to the nature of the fast development that is happening in this project, onl

## Reporting a Vulnerability

If you find a vulnerability, please either report a vulnerability [here](https://github.com/reworkd/AgentGPT/security) or contact us on twitter @asimdotshrestha. Please don't create a GitHub before contacting a maintainer to allow us to fix the vulnerability before others can take advantage of it.
If you find a vulnerability, please either report a vulnerability [here](https://github.com/reworkd/AgentGPT/security) or contact us on twitter @asimdotshrestha. Please don't create a GitHub before contacting a maintainer to allow us to fix the vulnerability before others can take advantage of it.
34 changes: 22 additions & 12 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
name: Test and Publish
on:
push:
branches: [ "main" ]
branches: ["main"]
pull_request:
branches: [ "main" ]
branches: ["main"]

env:
PYTHON_VERSION: "3.11"
NODE_VERSION: "18.x"

jobs:
check-version:
Expand Down Expand Up @@ -38,7 +39,7 @@ jobs:
- uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
cache: 'poetry'
cache: "poetry"
- run: poetry install
- name: Run isort check
run: poetry run isort --check .
Expand All @@ -54,7 +55,7 @@ jobs:
- uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
cache: 'poetry'
cache: "poetry"
- run: poetry install
- name: Run mypy check
run: poetry run mypy .
Expand All @@ -63,35 +64,44 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: ${{ env.NODE_VERSION }}
cache: "npm"
cache-dependency-path: package-lock.json
- name: Compile TypeScript
run: npm ci && npm run build
- name: Install poetry
run: pipx install poetry
- uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
cache: 'poetry'
cache: "poetry"
- run: poetry install && poetry run playwright install chromium
- name: Run pytest check
run: poetry run pytest -vv --cov="tarsier" .
env:
TARSIER_GOOGLE_OCR_CREDENTIALS: ${{ secrets.TARSIER_GOOGLE_OCR_CREDENTIALS }}

publish:
needs: [
check-version,
black,
mypy,
pytest
]
needs: [check-version, black, mypy, pytest]
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' && needs.check-version.outputs.should_publish == 'true'
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: ${{ env.NODE_VERSION }}
cache: "npm"
cache-dependency-path: package-lock.json
- name: Compile TypeScript
run: npm ci && npm run build
- name: Install poetry
run: pipx install poetry
- uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
cache: 'poetry'
cache: "poetry"
- run: poetry install
- name: Build and Publish
run: |
Expand Down
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -147,4 +147,8 @@ cython_debug/
*.index
*.db
*.bin
/screenshots/
/screenshots/

*.d.ts
*.js
node_modules
44 changes: 38 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,39 +19,44 @@
</p>

# Tarsier

If you've tried using GPT-4(V) to automate web interactions, you've probably run into questions like:

- How do you map LLM responses back into web elements?
- How can you mark up a page for an LLM better understand its action space?
- How do you feed a "screenshot" to a text-only LLM?

At Reworkd, we found ourselves reusing the same utility libraries to solve these problems across multiple projects.
Because of this we're now open-sourcing this simple utility library for multimodal web agents... Tarsier!
At Reworkd, we found ourselves reusing the same utility libraries to solve these problems across multiple projects.
Because of this we're now open-sourcing this simple utility library for multimodal web agents... Tarsier!
The video below demonstrates Tarsier usage by feeding a page snapshot into a langchain agent and letting it take actions.


https://github.com/reworkd/tarsier/assets/50181239/af12beda-89b5-4add-b888-d780b353304b


## How does it work?

Tarsier works by visually "tagging" interactable elements on a page via brackets + an id such as `[1]`.
In doing this, we provide a mapping between elements and ids for GPT-4(V) to take actions upon.
In doing this, we provide a mapping between elements and ids for GPT-4(V) to take actions upon.
We define interactable elements as buttons, links, or input fields that are visible on the page.

Can provide a textual representation of the page. This means that Tarsier enables deeper interaction for even non multi-modal LLMs.
This is important to note given performance issues with existing vision language models.
Tarsier also provides OCR utils to convert a page screenshot into a whitespace-structured string that an LLM without vision can understand.

## Installation

```shell
pip install tarsier
```

## Usage

Visit our [cookbook](https://github.com/reworkd/Tarsier/tree/main/cookbook) for agent examples using Tarsier:

- [An autonomous LangChain web agent](https://github.com/reworkd/tarsier/blob/main/cookbook/langchain-web-agent.ipynb) πŸ¦œβ›“οΈ
- [An autonomous LlamaIndex web agent](https://github.com/reworkd/tarsier/blob/main/cookbook/llama-index-web-agent.ipynb) πŸ¦™

Otherwise, basic Tarsier usage might look like the following:

```python
import asyncio

Expand All @@ -78,12 +83,38 @@ async def main():
if __name__ == '__main__':
asyncio.run(main())
```

## Local Development
### Setup
We have provided a handy setup script to get you up and running with Tarsier development.
```shell
./script/setup.sh
```
If you modify any TypeScript files used by Tarsier, you'll need to execute the following command.
This compiles the TypeScript into JavaScript, which can then be utilized in the Python package.
```shell
npm run build
```
### Testing
We use [pytest](https://docs.pytest.org) for testing. To run the tests, simply run:
```shell
poetry run pytest .
```
### Linting
Prior to submitting a potential PR, please run the following to format your code:
```shell
./script/format.sh
```


## Supported OCR Services

- [x] [Google Cloud Vision](https://cloud.google.com/vision)
- [ ] [Amazon Textract](https://aws.amazon.com/textract/) (Coming Soon)
- [ ] [Microsoft Azure Computer Vision](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/) (Coming Soon)

## Roadmap

- [x] Add documentation and examples
- [x] Clean up interfaces and add unit tests
- [x] Launch
Expand All @@ -93,6 +124,7 @@ if __name__ == '__main__':
- [ ] Add support for other OCR services as necessary

## Citations

```
bibtex
@misc{reworkd2023tarsier,
Expand Down
45 changes: 45 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 18 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"name": "tarsier",
"description": "Vision utilities for web interaction agents",
"private": true,
"version": "0.4.0",
"author": "Reworkd AI, INC.",
"license": "MIT",
"scripts": {
"build": "tsc -p ./tsconfig.json",
"format": "prettier --write .",
"lint": "prettier --check ."
},
"keywords": [],
"devDependencies": {
"prettier": "^3.1.0",
"typescript": "^5.2.2"
}
}
8 changes: 8 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ version = "0.4.0"
description = "Vision utilities for web interaction agents"
authors = ["Rohan Pandey", "Adam Watkins", "Asim Shrestha"]
readme = "README.md"
include = ["tarsier/**/*.js"]
exclude = ["tarsier/**/*.ts"]


[tool.poetry.dependencies]
python = "^3.10"
Expand Down Expand Up @@ -49,6 +52,11 @@ namespace_packages = true
files = "tarsier"
exclude = ["tests", "venv"]

[tool.pytest.ini_options]
filterwarnings = [
"ignore::DeprecationWarning",
]

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
5 changes: 4 additions & 1 deletion format.sh β†’ scripts/format.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
#!/bin/sh
cd "$(dirname "$0")" || exit 1

printf "Formatting Code 🧹\n"
printf "Formatting JS 🧹\n"
npm run format

printf "\nFormatting Python 🧹\n"
poetry run black .

printf "\nSorting imports 🧹\n"
Expand Down
7 changes: 7 additions & 0 deletions scripts/setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/sh
cd "$(dirname "$0")" || exit 1

npm install
npm run build

poetry install
12 changes: 12 additions & 0 deletions tarsier/_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from os import PathLike
from typing import Any


def load_js(path: PathLike[Any]) -> str:
try:
with open(path, "r") as f:
return f.read()
except FileNotFoundError as e:
raise ValueError(
"Could not find tag_utils.js. Please ensure that you complied typescript using `npm run build`"
) from e
Loading

0 comments on commit ae5a749

Please sign in to comment.