Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[agent] system message + SWE-Bench instruction improvements #7018

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Feb 28, 2025

  • This change is worth documenting at https://docs.all-hands.dev/
  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

End-user friendly description of the problem this fixes or functionality that this introduces.


Give a summary of what the PR does, explaining any non-trivial design decisions.

Follow-up of #7002 and #7010.

This PR re-organized the system message and the user manual instructions to make them clearer.


Link of any specific issues this addresses.


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:3cced2f-nikolaik   --name openhands-app-3cced2f   docker.all-hands.dev/all-hands-ai/openhands:3cced2f

xingyaoww and others added 13 commits February 27, 2025 12:09
Traceback (most recent call last):
  File "/home/xingyaow/OpenHands-dev/evaluation/utils/shared.py", line 323, in _process_instance_wrapper
    result = process_instance_func(instance, metadata, use_mp, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xingyaow/OpenHands-dev/evaluation/benchmarks/swe_bench/run_infer.py", line 461, in process_instance
    raise EvalException('Fatal error detected: ' + state.last_error)
evaluation.utils.shared.EvalException: Fatal error detected: ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xingyaow/micromamba/envs/oh/lib/python3.12/logging/__init__.py", line 1160, in emit
    msg = self.format(record)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/xingyaow/micromamba/envs/oh/lib/python3.12/logging/__init__.py", line 999, in format
    return fmt.format(record)
           ^^^^^^^^^^^^^^^^^^
  File "/home/xingyaow/micromamba/envs/oh/lib/python3.12/logging/__init__.py", line 719, in format
    s = s + self.formatStack(record.stack_info)
        ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: can only concatenate str (not "bool") to str
Call stack:
  File "/home/xingyaow/OpenHands-dev/evaluation/benchmarks/swe_bench/run_infer.py", line 584, in <module>
    run_evaluation(
  File "/home/xingyaow/OpenHands-dev/evaluation/utils/shared.py", line 415, in run_evaluation
    with mp.Pool(num_workers) as pool:
  File "/home/xingyaow/micromamba/envs/oh/lib/python3.12/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "/home/xingyaow/micromamba/envs/oh/lib/python3.12/multiprocessing/pool.py", line 215, in __init__
    self._repopulate_pool()
  File "/home/xingyaow/micromamba/envs/oh/lib/python3.12/multiprocessing/pool.py", line 306, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "/home/xingyaow/micromamba/envs/oh/lib/python3.12/multiprocessing/pool.py", line 329, in _repopulate_pool_static
    w.start()
  File "/home/xingyaow/micromamba/envs/oh/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/home/xingyaow/micromamba/envs/oh/lib/python3.12/multiprocessing/context.py", line 282, in _Popen
    return Popen(process_obj)
  File "/home/xingyaow/micromamba/envs/oh/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/xingyaow/micromamba/envs/oh/lib/python3.12/multiprocessing/popen_fork.py", line 71, in _launch
    code = process_obj._bootstrap(parent_sentinel=child_r)
  File "/home/xingyaow/micromamba/envs/oh/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/xingyaow/micromamba/envs/oh/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/xingyaow/micromamba/envs/oh/lib/python3.12/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/xingyaow/OpenHands-dev/evaluation/utils/shared.py", line 384, in _process_instance_wrapper_mp
    return _process_instance_wrapper(*args)
  File "/home/xingyaow/OpenHands-dev/evaluation/utils/shared.py", line 378, in _process_instance_wrapper
    logger.error(msg)
Message: '----------\nError in instance [pydata__xarray-7233]: Fatal error detected: ConnectionError: (\'Connection aborted.\', RemoteDisconnected(\'Remote end closed connection without response\')). Stacktrace:\nTraceback (most recent call last):\n  File "/home/xingyaow/OpenHands-dev/evaluation/utils/shared.py", line 323, in _process_instance_wrapper\n    result = process_instance_func(instance, metadata, use_mp, **kwargs)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/xingyaow/OpenHands-dev/evaluation/benchmarks/swe_bench/run_infer.py", line 461, in process_instance\n    raise EvalException(\'Fatal error detected: \' + state.last_error)\nevaluation.utils.shared.EvalException: Fatal error detected: ConnectionError: (\'Connection aborted.\', RemoteDisconnected(\'Remote end closed connection without response\'))\n\n----------[The above error occurred. Retrying... (attempt 1 of 5)]----------\n'
Arguments: ()
Instance pydata__xarray-7233 - 2025-02-27 23:04:06,218 - WARNING - Using o
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant