You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, thank authors for this interesting work~
Next, I have some questions about the reimplementation details:
I notice that in Table 2, experiments are based on Llama2-7B model, but the results from ETO are based on Llama2-7B-chat model. May I know whether authors also choose the instruction/chat version as the base model in IPR?
I study the source code and find two filtering threshold (named step_threshold and traj_threshold). In paper, authors claim that filtering threshold τ is adjusted to 0.5 for ALFWorld, 0.01 for WebShop and 0.1 for InterCodeSQL. May I ask whether step_threshold==traj_threshold==τ or just step_threshold==τ (in this situation, did the traj_threshold use the settings in ETO)? I really appreciate any assistance in clarifying the hyper-parameters.
Table 2 reports the best performance across all iterations. Could authors declare their chosen best iteration for different datasets which may help me much in reproducing the experimental results?
Thanks.
The text was updated successfully, but these errors were encountered:
First, thank authors for this interesting work~
Next, I have some questions about the reimplementation details:
Thanks.
The text was updated successfully, but these errors were encountered: