The implementation of dynamic batch loading code seems inconsistent with the pseudo-code in the paper #42

YWMditto · 2023-12-25T07:23:29Z

For example, truncating the loss difference to 0 does not seem to be implemented.

LLM-Shearing/llmshearing/callbacks/dynamic_loading_callback.py

Line 34 in 1386c8f

diff = torch.tensor(losses) - torch.tensor(self.target_loss)

And, what is the purpose of this line?

LLM-Shearing/llmshearing/callbacks/dynamic_loading_callback.py

Line 41 in 1386c8f

updated_domain_weights = (1-c) * updated_alpha + c / self.n_domains

xiamengzhou · 2024-01-10T02:39:39Z

Hi, sorry for the late reply!

In the implementation, we added a small uniform proportion (c=1e-4/7) to each domain when updating the weights. It's simply a smoothing factor, and does not in practice affect results much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The implementation of dynamic batch loading code seems inconsistent with the pseudo-code in the paper #42

The implementation of dynamic batch loading code seems inconsistent with the pseudo-code in the paper #42

YWMditto commented Dec 25, 2023

xiamengzhou commented Jan 10, 2024

The implementation of dynamic batch loading code seems inconsistent with the pseudo-code in the paper #42

The implementation of dynamic batch loading code seems inconsistent with the pseudo-code in the paper #42

Comments

YWMditto commented Dec 25, 2023

xiamengzhou commented Jan 10, 2024