Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about Paper Reimplementation #6

Open
zouyingcao opened this issue Oct 16, 2024 · 0 comments
Open

Confusion about Paper Reimplementation #6

zouyingcao opened this issue Oct 16, 2024 · 0 comments

Comments

@zouyingcao
Copy link

First, thank authors for this interesting work~
Next, I have some questions about the reimplementation details:

  1. I notice that in Table 2, experiments are based on Llama2-7B model, but the results from ETO are based on Llama2-7B-chat model. May I know whether authors also choose the instruction/chat version as the base model in IPR?
  2. I study the source code and find two filtering threshold (named step_threshold and traj_threshold). In paper, authors claim that filtering threshold τ is adjusted to 0.5 for ALFWorld, 0.01 for WebShop and 0.1 for InterCodeSQL. May I ask whether step_threshold==traj_threshold==τ or just step_threshold==τ (in this situation, did the traj_threshold use the settings in ETO)? I really appreciate any assistance in clarifying the hyper-parameters.
  3. Table 2 reports the best performance across all iterations. Could authors declare their chosen best iteration for different datasets which may help me much in reproducing the experimental results?
    Thanks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant