how to interpret results for Auto Code Rover SWE-bench? #47

ramsey-coding · 2024-05-13T22:33:36Z

I am trying to understand results for Auto Code Rover and SWE-Agent.

Can you please let me know the format of the SWE-Agent test results in:
https://github.com/nus-apr/auto-code-rover/tree/main/results/swe-agent-results

What are all these cost_2_1, cost_2_2, and cost_2_3?

How can I to understand the results in this directory?

Also for Auto Code Reover, I see acr-run-1, acr-run-2, acr-run-3. Which one should I take? Which result are you reporting in the paper?

ramsey-coding · 2024-05-14T00:06:41Z

what's the difference between the following fields?

        "generated": 249,
        "with_logs": 249,
        "applied": 245,
        "resolved": 48

zhiyufan · 2024-05-16T07:19:31Z

cost_X_Y: X is the budget cost of running swe-agent in our experiment, and Y is the trail of repetition.
In this case, we used a budget of 2 USD, and repeated the experiment 3 times.
Inside the cost_X_Y directory, *.traj files are the conversation log files for each task instance in swe-bench.
all_pred.jsonl includes all the generated patches.

For AutoCodeRover acr-run-1, acr-run-2, and acr-run-3 results align with Table-3, In our environment, the ACR column.

generated: there is an agent-generated patch for this issue
with_logs: a log file is produced when executing the passing/failing test-cases of this issue
applied: the patch can be applied successfully to the original program.
resolved: the patch made the passing/failing test-cases of this issue pass

The details on how the stats are generated can be found here: https://github.com/yuntongzhang/SWE-bench/blob/main/metrics/report.py#L264C5-L264C21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to interpret results for Auto Code Rover SWE-bench? #47

how to interpret results for Auto Code Rover SWE-bench? #47

ramsey-coding commented May 13, 2024

ramsey-coding commented May 14, 2024

zhiyufan commented May 16, 2024

how to interpret results for Auto Code Rover SWE-bench? #47

how to interpret results for Auto Code Rover SWE-bench? #47

Comments

ramsey-coding commented May 13, 2024

ramsey-coding commented May 14, 2024

zhiyufan commented May 16, 2024