Skip to content

Commit 999a9ad

Browse files
authored
Merge pull request #5 from MadeAgents/dev
Dev
2 parents f1081bc + 2eac6c3 commit 999a9ad

35 files changed

+3644
-166
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,5 @@ output
2323
.env
2424
poetry.lock
2525
**/proto
26-
**/.eggs
26+
**/.eggs
27+
*.log

.gitmodules

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
[submodule "third_party/android_world"]
2+
path = third_party/android_world
3+
url = https://github.com/MadeAgents/android_world.git
4+
[submodule "third_party/android_env"]
5+
path = third_party/android_env
6+
url = https://github.com/google-deepmind/android_env.git
7+
[submodule "third_party/Android-Lab"]
8+
path = third_party/Android-Lab
9+
url = https://github.com/THUDM/Android-Lab.git

.vscode/settings.json

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"python.analysis.extraPaths": [
3+
"./third_party/android_env",
4+
"./third_party/android_world",
5+
"./third_party/Android-Lab"
6+
],
7+
"python.analysis.autoSearchPaths": true,
8+
"python.analysis.useLibraryCodeForTypes": true
9+
}

README.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,21 @@ The user inputs the task description on the Web interface, and the Mobile Use au
1515

1616

1717
## 🎉 News
18-
- **[2025/03/28]**: The [document](docs/AndroidWorld.md) for running Mobile Use in the AndroidWorld dynamic environment now is released!
18+
- **[2025/05/13]**: Mobile Use v0.3.0 now is released! AndroidLab dynamic environment now is released! Significant improvements have been achieved on the two evaluation benchmarks of [AndroidLab](https://github.com/THUDM/Android-Lab) and [AndroidWorld](https://github.com/google-research/android_world).
19+
- **[2025/03/28]**: The [document](benchmark/android_world/README.md) for running Mobile Use in the AndroidWorld dynamic environment now is released!
1920
- **[2025/03/17]**: Mobile Use now supports the [multi-agent](mobile_use/agents/multi_agent.py) framework! Equipped with planning, reflection, memorization and progress mechanisms, Mobile Use achieves impressive performance on AndroidWorld!
2021
- **[2025/03/04]**: Mobile Use is released! We have also released v0.1.0 of [mobile-use](https://github.com/MadeAgents/mobile-use) library, providing you an AI assistant for mobile - Any app, any task!
2122

2223
## 📊 Benchmark
23-
![](docs/assets/benchmark.png)
24+
![](docs/assets/androidworld_benchmark.png)
25+
26+
In the [AndroidWorld](https://github.com/google-research/android_world) dynamic evaluation environment, we evaluated the multi-agent version of Mobile Use agent with the multimodal large language model Qwen2.5-VL-72B-Instruct and achieved a 61.2% success rate.
27+
28+
29+
![](docs/assets/androidlab_benchmark.png)
30+
31+
In the [AndroidLab](https://github.com/THUDM/Android-Lab) dynamic evaluation environment, we evaluated the multi-agent version of Mobile Use agent with the multimodal large language model Qwen2.5-VL-72B-Instruct and achieved a 44.2% success rate.
2432

25-
In the [AndroidWorld](https://github.com/google-research/android_world) dynamic evaluation environment, we evaluated the multi-agent version of Mobile Use agent with the multimodal large language model Qwen2.5-VL-72B-Instruct and achieved a 48% success rate.
2633

2734
## ✨ Key Features
2835
- **Auto-operating the phone**: Automatically operate the UI to complete tasks based on user input descriptions.

benchmark/android_lab/README.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Benchmark MobileUse in AndroidLab
2+
3+
## Step 1: Environment Setup
4+
**Install AndroidLab requirements**
5+
```
6+
pip install -r third_party/Android_Lab/requirements.txt
7+
```
8+
9+
10+
**Set up the AVD environment**
11+
12+
Set up detail see [Android_Lab document](https://github.com/THUDM/Android-Lab?tab=readme-ov-file).
13+
14+
We recommand use Docker on Linux (x86_64).
15+
16+
17+
**Install mobile-use**
18+
Install mobile-use by following the guidance in [README.md](../README.md).
19+
20+
21+
## Step 2: Perform the benchmark
22+
1. Copy the template config file and set your api_key and base_url in the config file
23+
```
24+
cp benchmark/android_lab/configs/mobile-use-MultiAgent_template.yaml benchmark/android_lab/configs/mobile-use-MultiAgent.yaml
25+
```
26+
27+
2. Start evaluation
28+
```
29+
python eval.py -n test_name -c benchmark/android_lab/configs/mobile-use-MultiAgent.yaml
30+
```
31+
32+
3. Calculate the metrics
33+
```
34+
python benchmark/android_lab/generate_result.py --input_folder logs/evaluation_mobile_use
35+
```
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: OpenAIAgent(gpt-4o)
2+
3+
agent:
4+
name: OpenAIAgent
5+
args:
6+
api_key: xxxx
7+
# api_base: xxxx
8+
model_name: gpt-4o
9+
max_new_tokens: 512
10+
11+
task:
12+
class: TextOnlyMobileTask_AutoTest
13+
args:
14+
save_dir: "./logs/evaluation_openai_agent"
15+
max_rounds: 25
16+
request_interval: 3
17+
mode: "in_app"
18+
19+
eval:
20+
avd_name: Pixel_7_Pro_API_33
21+
avd_log_dir: ./logs/evaluation
22+
docker: True
23+
docker_args:
24+
image_name: android_eval:latest
25+
port: 6060
26+
27+
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
name: MobileUseMultiAgent
2+
3+
agent:
4+
name: MobileUseAgent
5+
args:
6+
vllm_config:
7+
model_name: qwen2.5-vl-72b-instruct
8+
api_key: xxxxx
9+
base_url: xxxxx
10+
max_tokens: 1024
11+
agent_config:
12+
type: MultiAgent
13+
use_note_taker: false
14+
use_planner: false
15+
use_reflector: true
16+
use_long_reflector: true
17+
evaluate_when_finish: true
18+
use_processor: true
19+
20+
task:
21+
class: MobileUse_AutoTest
22+
args:
23+
save_dir: "./logs/evaluation_mobile_use"
24+
max_rounds: 25
25+
request_interval: 3
26+
27+
eval:
28+
avd_name: Pixel_7_Pro_API_33
29+
avd_log_dir: ./logs/evaluation
30+
docker: True
31+
docker_args:
32+
image_name: android_eval:latest
33+
port: 6060
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: MobileUseReActAgent
2+
3+
agent:
4+
name: MobileUseAgent
5+
args:
6+
vllm_config:
7+
model_name: qwen2.5-vl-72b-instruct
8+
api_key: xxxxx
9+
base_url: xxxxx
10+
max_tokens: 1024
11+
agent_config:
12+
type: ReAct
13+
14+
task:
15+
class: MobileUse_AutoTest
16+
args:
17+
save_dir: "./logs/evaluation_mobile_use"
18+
max_rounds: 25
19+
request_interval: 3
20+
21+
eval:
22+
avd_name: Pixel_7_Pro_API_33
23+
avd_log_dir: ./logs/evaluation
24+
docker: True
25+
docker_args:
26+
image_name: android_eval:latest
27+
port: 6060

benchmark/android_lab/eval.py

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
import os
2+
import sys
3+
import argparse
4+
import yaml
5+
6+
parant_dir = os.path.dirname(__file__)
7+
project_home = os.path.dirname(os.path.dirname(parant_dir))
8+
sys.path = [
9+
os.path.join(project_home, 'third_party/Android-Lab')
10+
] + sys.path
11+
12+
from agent import get_agent
13+
from evaluation.auto_test import *
14+
from evaluation.parallel import parallel_worker
15+
from generate_result import find_all_task_files
16+
from evaluation.configs import AppConfig, TaskConfig
17+
from mobile_use_auto_test import *
18+
from mobile_use_executor import *
19+
20+
21+
if __name__ == '__main__':
22+
android_lab_dir = os.path.join(project_home, 'third_party/Android-Lab')
23+
task_yamls = os.listdir(f'{android_lab_dir}/evaluation/config')
24+
task_yamls = [f"{android_lab_dir}/evaluation/config/" + i for i in task_yamls if i.endswith(".yaml")]
25+
26+
arg_parser = argparse.ArgumentParser()
27+
arg_parser.add_argument("-n", "--name", default=None, type=str)
28+
arg_parser.add_argument("-c", "--config", default=f"{parant_dir}/config.yaml", type=str)
29+
arg_parser.add_argument("--task_config", nargs="+", default=task_yamls, help="All task config(s) to load")
30+
arg_parser.add_argument("--task_id", nargs="+", default=None)
31+
arg_parser.add_argument("--debug", action="store_true", default=False)
32+
arg_parser.add_argument("--app", nargs="+", default=None)
33+
arg_parser.add_argument("-p", "--parallel", default=1, type=int)
34+
35+
args = arg_parser.parse_args()
36+
with open(args.config, "r") as file:
37+
yaml_data = yaml.safe_load(file)
38+
39+
agent_config = yaml_data["agent"]
40+
task_config = yaml_data["task"]
41+
eval_config = yaml_data["eval"]
42+
43+
if args.name is None:
44+
args.name = f"{yaml_data.get('name', agent_config['name'])}_{datetime.datetime.now().strftime('%Y%m%dT%H%M%S')}"
45+
46+
autotask_class = task_config["class"] if "class" in task_config else "ScreenshotMobileTask_AutoTest"
47+
48+
single_config = TaskConfig(**task_config["args"])
49+
single_config = single_config.add_config(eval_config)
50+
if "True" == agent_config.get("relative_bbox"):
51+
single_config.is_relative_bbox = True
52+
agent_class = globals().get(agent_config["name"])
53+
if agent_class is None:
54+
agent = get_agent(agent_config["name"], **agent_config["args"])
55+
else:
56+
agent = agent_class(**agent_config["args"])
57+
58+
task_files = find_all_task_files(args.task_config)
59+
print(f"Evaluation saved name: {args.name}")
60+
if os.path.exists(os.path.join(single_config.save_dir, args.name)):
61+
already_run = os.listdir(os.path.join(single_config.save_dir, args.name))
62+
already_run = [i.split("_")[0] + "_" + i.split("_")[1] for i in already_run]
63+
else:
64+
already_run = []
65+
66+
all_task_start_info = []
67+
for app_task_config_path in task_files:
68+
app_config = AppConfig(app_task_config_path)
69+
if args.task_id is None:
70+
task_ids = list(app_config.task_name.keys())
71+
else:
72+
task_ids = args.task_id
73+
for task_id in task_ids:
74+
if task_id in already_run:
75+
print(f"Task {task_id} already run, skipping")
76+
continue
77+
if task_id not in app_config.task_name:
78+
continue
79+
task_instruction = app_config.task_name[task_id].strip()
80+
app = app_config.APP
81+
if args.app is not None:
82+
print(app, args.app)
83+
if app not in args.app:
84+
continue
85+
package = app_config.package
86+
command_per_step = app_config.command_per_step.get(task_id, None)
87+
88+
task_instruction = f"You should use {app} to complete the following task: {task_instruction}"
89+
all_task_start_info.append({
90+
"agent": agent,
91+
"task_id": task_id,
92+
"task_instruction": task_instruction,
93+
"package": package,
94+
"command_per_step": command_per_step,
95+
"app": app
96+
})
97+
98+
class_ = globals().get(autotask_class)
99+
if class_ is None:
100+
raise AttributeError(f"Class {autotask_class} not found. Please check the class name in the config file.")
101+
102+
if args.parallel == 1:
103+
Auto_Test = class_(single_config.subdir_config(args.name))
104+
print("Auto_Test", Auto_Test)
105+
Auto_Test.run_serial(all_task_start_info)
106+
else:
107+
parallel_worker(class_, single_config.subdir_config(args.name), args.parallel, all_task_start_info)

0 commit comments

Comments
 (0)