Some requests #59

godlikehhd · 2024-08-23T09:19:31Z

if possible, please add the way to evaluate swe-bench-verified.
did you use other model like LLama 3.1 or Claude as backbone, if so, could you please release the result
Thanks a lot !

Marti2203 · 2024-09-12T02:41:28Z

Hi,

Yes, you can use https://github.com/nus-apr/auto-code-rover/blob/main/conf/swe_verified_tasks.txt just as you use swe_lite_tasks.txt see also https://github.com/nus-apr/auto-code-rover#swe-bench-mode-set-up-and-run-on-swe-bench-tasks
We have support for both of these models and have ran preliminary tests with LLama3.1 and a lot more with Claude 3 Opus and 3.5 Sonnet. For sonnet, we have submitted the results to swe-bench
Hope that this answers your question :)

godlikehhd closed this as completed Sep 12, 2024

Provide feedback