Confusion about Paper Reimplementation #6

zouyingcao · 2024-10-16T09:35:47Z

First, thank authors for this interesting work~
Next, I have some questions about the reimplementation details:

I notice that in Table 2, experiments are based on Llama2-7B model, but the results from ETO are based on Llama2-7B-chat model. May I know whether authors also choose the instruction/chat version as the base model in IPR?
I study the source code and find two filtering threshold (named step_threshold and traj_threshold). In paper, authors claim that filtering threshold τ is adjusted to 0.5 for ALFWorld, 0.01 for WebShop and 0.1 for InterCodeSQL. May I ask whether step_threshold==traj_threshold==τ or just step_threshold==τ (in this situation, did the traj_threshold use the settings in ETO)? I really appreciate any assistance in clarifying the hyper-parameters.
Table 2 reports the best performance across all iterations. Could authors declare their chosen best iteration for different datasets which may help me much in reproducing the experimental results?
Thanks.

Provide feedback