Skip to content

Commit

Permalink
Improved the README.md dramatically, included pictures, included viv …
Browse files Browse the repository at this point in the history
…tutorial
  • Loading branch information
GatlenCulp committed Sep 29, 2024
1 parent 39ef4c8 commit c0f1bdb
Show file tree
Hide file tree
Showing 7 changed files with 259 additions and 37 deletions.
216 changes: 179 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,54 +2,196 @@

**WARNING: This is currently a pre-production Forumla that has not been thoroughly tested and which installs a currently non-official version of Vivaria**

![./assets/metr_logo.svg](./assets/metr_logo.svg)
![./assets/logo.png](./assets/logo.png)

[Vivaria](https://vivaria.metr.org/) is METR's tool for running evaluations and conducting agent elicitation research. This package contains a web app which is used for running and organzing evaluations as well as a command line interface to aid in the development of tasks. More information can be found on the website [here](https://vivaria.metr.org/).

For testing purposes and creating installation helper scripts, Gatlen has created [his own fork of Vivaria](https://github.com/GatlenCulp/vivaria/) which this installs.
For testing purposes and creating installation helper scripts, Gatlen has created [his own fork of Vivaria](https://github.com/GatlenCulp/vivaria/) which this installs. See the original [here](https://github.com/METR/vivaria)

## 00 TOC

- [homebrew-vivaria](#homebrew-vivaria)
- [00 TOC](#00-toc)
- [01 Setup](#01-setup)
- [01.01 Installation](#0101-installation)
- [02 Testing](#02-testing)
- [03 Uninstalling](#03-uninstalling)
- [04 Updating](#04-updating)
- [05 Known Issues](#05-known-issues)
- [06 Developing and Maintaining this Formula](#06-developing-and-maintaining-this-formula)
- [06.01 The Source](#0601-the-source)
- [06.02 The Python Dependencies](#0602-the-python-dependencies)
- [06.03 Copying Everything Over](#0603-copying-everything-over)
- [06.04 Debugging](#0604-debugging)
- [06.05 Random Useful Notes](#0605-random-useful-notes)
- [06.06 Roadmap](#0606-roadmap)
- [07 Contact the Maintainer](#07-contact-the-maintainer)

- [01.02 Web UI](#0102-web-ui)
- [01.03 Starting and Testing a Task via the CLI](#0103-starting-and-testing-a-task-via-the-cli)
- [01.04 Evaluating an Agent on a Task using the CLI and Web UI](#0104-evaluating-an-agent-on-a-task-using-the-cli-and-web-ui)
- [01.05 Shutting down](#0105-shutting-down)
- [01.06 Examples for Testing](#0106-examples-for-testing)
- [02 Uninstalling](#02-uninstalling)
- [03 Updating](#03-updating)
- [04 Known Issues](#04-known-issues)
- [05 Developing and Maintaining this Formula](#05-developing-and-maintaining-this-formula)
- [05.01 The Source](#0501-the-source)
- [05.02 The Python Dependencies](#0502-the-python-dependencies)
- [05.03 Copying Everything Over](#0503-copying-everything-over)
- [05.04 Debugging](#0504-debugging)
- [05.05 Random Useful Notes](#0505-random-useful-notes)
- [05.06 Roadmap](#0506-roadmap)
- [06 Contact the Maintainer](#06-contact-the-maintainer)

---
## 01 Setup

### 01.01 Installation

1. Tap this repository
`brew tap GatlenCulp/vivaria`
```bash
brew tap GatlenCulp/vivaria
```

2. Install Vivaria
`brew install vivaria`
```bash
brew install vivaria
```

3. Run the post-installation setup (This will ask you for a valid [OpenAI API Key](https://medium.com/@lorenzozar/how-to-get-your-own-openai-api-key-f4d44e60c327)).
*Be cautious running this command multiple times as it will overwrite your current configuration and will require you to follow all the instructions from here onward*
`viv setup`
1. Open docker
`open -a Docker`
1. Build and run the server images (This may take a while)
`viv docker compose up --detach --wait`
```bash
viv setup
```

4. Open docker
```bash
open -a Docker
```

5. Build and run the server images (This may take a while)
```bash
viv docker compose up --detach --wait
```

### 01.02 Web UI

### 02 Testing
6. Test that the server is up
`docker compose ps` (You should see a list of containers)
`curl http://localhost:4001/health` (You should see a JSON response `{...TODO...}`)
```bash
docker compose ps # You should see a list of containers
curl http://localhost:4001/health # You should see a JSON response {...TODO...}
```

7. Open the Vivaria web interface at https://localhost:4000/ and continue past the "insecure connection" warning.
8. The website will prompt you for your `ACCESS_TOKEN` and `ID_TOKEN` from `.env.server`

## 03 Uninstalling
8. The website will prompt you for your `ACCESS_TOKEN` and `ID_TOKEN` from `.env.server`.
![./assets/token-prompt.png](./assets/token-prompt.png)
You can get these by running
```bash
cat "$(brew --prefix vivaria)/vivaria/.env.server" | grep -E "ACCESS_TOKEN=|ID_TOKEN="
```

you should then have access to the web interface. The landing page should look like this:

TODO: INSERT IMAGE

### 01.03 Starting and Testing a Task via the CLI

9. You can start your first task environment by loading up the example task "abandon" from the "reverse_hash" task family:
```bash
viv task start reverse_hash/abandon $(brew --prefix vivaria)/vivaria/task-standard/examples/reverse_hash
```

10. Now that the task is running you should be able to see it by running:
```bash
viv task list
```

11. Enter the task environment container by running:
```bash
viv task enter --user agent
```

12. Look at the instructions in `/home/agent/`:
```bash
pwd
cat instructions.txt
```
You should see something like:
>Please find the word whose SHA-256 hash is as follows: 98sdfs923432jsdsdf
> Return only the word as your answer.
13. Write up a `submission.txt` document. The solution to this task is "abandon":
```bash
echo "abandon" > submission.txt
```

14. Exit the container:
```bash
exit
```

15. Check your score from the `submission.txt` document or check other solutions:
```bash
viv task score # You should see something like "1" TODO
viv task score --submission "wrong answer" # You should see something like "0" TODO
```

16. Stop the task:
```bash
viv task kill
```

### 01.04 Evaluating an Agent on a Task using the CLI and Web UI

17. Unfortunately, Vivaria does not come included with an example agent, but we can add one easily to our installation directory. We will add the public [modular agent](https://github.com/poking-agents/modular-public), developed by METR:
```bash
mkdir -p "$(brew --prefix vivaria)/agents"
git clone https://github.com/poking-agents/modular-public \
"$(brew --prefix vivaria)/agents/modular-public"
```

18. We will now run this agent on the same `reverse_hash/abandon` task we did above.
```bash
viv run reverse_hash/abandon \
--task-family-path $(brew --prefix vivaria)/vivaria/task-standard/examples/reverse_hash \
--agent-path $(brew --prefix vivaria)/agents/modular-public
```

19. The last command prints a link to [https://localhost:4000/path/to/run](https://localhost:4000/...) Follow that link to see the run's trace and track the agent's progress on the task. The run page should update as the agent takes actions live. It should look something like this:

![./assets/run-page.png](./assets/run-page.png)

20. Play with the interface a bit to get an understanding of the tool
21. Head back to the homepage at [https://localhost:4000/](https://localhost:4000/) and check out the runs page, and run the default query. This is where you can view the summaries of your ran tasks. It should look a bit like this (with less items):

![./assets/runs-page.png](./assets/runs-page.png)

22. You can then kill the task (or jump into the task environment and poke around.)
```bash
viv task kill
```

### 01.05 Shutting down

23. To shut down the server, you just need to stop the images

```bash
docker compose down
```

24. And confirm there are no more active images

```bash
docker compose ps
```

### 01.06 Examples for Testing

25. Located in `$(brew --prefix vivaria)/vivaria/task-standard/examples` are a variety of example tasks you can examine, run, and test to understand how to create your own tasks

```bash
ls $(brew --prefix vivaria)/vivaria/task-standard/examples
```
> agentbench crypto gaia gpu_inference humaneval machine_learning_local reverse_hash vm_test
count_odds days_since gpqa_diamond hello_world local_research pico_ctf swe_bench

---
## 02 Uninstalling

To uninstall, run `brew uninstall vivaria`. This will not delete your `~/.config/viv-cli/` directory. That must be removed manually.

## 04 Updating
---
## 03 Updating

To update Vivaria to the latest version:

Expand Down Expand Up @@ -81,16 +223,16 @@ Note: If you encounter any issues after upgrading, you may need to remove the ol
viv docker compose down --rmi all
viv docker compose up --detach --wait --build
```

## 05 Known Issues
---
## 04 Known Issues

**ISSUE: Install failed due to docker**
This may be fixed by running `brew link docker` and trying the installation again.

---
## 06 Developing and Maintaining this Formula
## 05 Developing and Maintaining this Formula

### 06.01 The Source
### 05.01 The Source

Brew downloads your source and places it in a temporary build directory. This source is commonly pulled in two different ways.

Expand Down Expand Up @@ -118,7 +260,7 @@ Vivaria requires the `.git` repository files and GitHub does not include those i
revision: "d67cc7894064e45f3459104c0f004fc1bd86612b"
```

### 06.02 The Python Dependencies
### 05.02 The Python Dependencies

Brew has a light policy to not allow you to use the internet during the installation process, which is a problem for pip installing the packages necessary for Vivaria. Luckily, Brew has a decent interface for managing Python packages and setting up a virtual environment. More information can be found here: [https://docs.brew.sh/Python-for-Formula-Authors](https://docs.brew.sh/Python-for-Formula-Authors).

Expand Down Expand Up @@ -160,7 +302,7 @@ class Vivaria < Formula

In making the virtual environment and building the package which will automatically make an executable called `viv` in the virtual environment's bin. We then copy this executable to `final_install_path/bin` which contain executables which are symlinked to Brew's bin (ex: `/opt/homebrew/bin/`) which is on the user's path, making the `viv` script available to the user anywhere.

### 06.03 Copying Everything Over
### 05.03 Copying Everything Over

At this point, the viv-cli is essentially installed, but since the web ui relies on typescript and docker files, we need to maintain a large chunk of the original project files in `final_install_path`. In the rest of the script, we install all the docs into the folder brew expects to find them (`final_install_path/share/doc`), delete everything we no longer need, and copy the rest over from the build path to the final install path under `vivaria`.

Expand Down Expand Up @@ -189,7 +331,7 @@ class Vivaria < Formula
src_dir.install Dir["*", ".*"].reject { |f| ['.', '..'].include?(File.basename(f)) }
```

### 06.04 Debugging
### 05.04 Debugging

To install the formula with debug mode and receive more verbose errors during developing the formula, you can run:
`brew install --formula --debug --verbose ./Formula/vivaria.rb`
Expand All @@ -201,7 +343,7 @@ I was attempting to set up a Ruby debugger w/ intellisense in VSCode but it wasn
`gem install ruby-lsp`
`gem install debug`

### 06.05 Random Useful Notes
### 05.05 Random Useful Notes

`echo $(brew --prefix vivaria)` can be used to get the [opt-prefix](https://docs.brew.sh/Manpage) for Vivaria. This returns a static path to a symlinked folder pointing to the most recent version of vivaria.

Expand Down Expand Up @@ -245,7 +387,7 @@ gatlenculp/vivaria/vivaria
Error: 22 problems in 1 formula detected.
```

### 06.06 Roadmap
### 05.06 Roadmap

- [ ] Automatically configure an SSH key for the user to use with viv.
```ruby
Expand All @@ -261,7 +403,7 @@ File.open(prefix/".env", "a") { |f| f.puts "SSH_PUBLIC_KEY_PATH=#{ssh_key_path}.
system "viv", "register-ssh-public-key", "#{ssh_key_path}.pub"
```

## 07 Contact the Maintainer
## 06 Contact the Maintainer

Gatlen Culp, METR Contractor
Email: [email protected]
Expand Down
Binary file added assets/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit c0f1bdb

Please sign in to comment.