dvlab-research / LongLoRA Public

Notifications
Fork 279
Star 2.6k

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: dvlab-research/LongLoRA

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

48 Open 125 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

not able to reproduce the passkey retrieval accuracy

#195 opened Sep 16, 2024 by zhuconv updated Sep 22, 2024

LongBench evaluation

#194 opened Aug 27, 2024 by Clement25 updated Aug 30, 2024

是否支持如GPT2这类的supervised fine-tune？

#193 opened Aug 23, 2024 by CharRic updated Aug 23, 2024

How LongAlpaca Data was constructed?

#192 opened Jul 31, 2024 by S1s-Z updated Jul 31, 2024

推理时候显存分配

#163 opened Dec 28, 2023 by xxcoco763 updated Jul 23, 2024

这套代码是否支持qwen/baichuan微调一个中文的长文本模型，代码需要做哪些修改？

#191 opened Jul 22, 2024 by jy-101361-1810897 updated Jul 22, 2024

Something wrong with the torch version

#185 opened May 19, 2024 by dian1414 updated Jun 26, 2024

norm层不是没有参数矩阵吗

#190 opened Jun 3, 2024 by changanxunyi updated Jun 3, 2024

I am unable to reproduce the results from the paper for llama-7B-32k-longlora ppl.

#188 opened May 28, 2024 by masteryqq updated May 28, 2024

embedding 为什么要resize成32001？

#186 opened May 22, 2024 by momandai updated May 22, 2024

What's the trainset is used to obtain “Model with contextg extension via improved LoRA fine-tuning” (LoRA+)？

#184 opened Apr 22, 2024 by ZackZikaiXiao updated Apr 22, 2024

How did make questions and answers for long context(LongAlpaca)?

#183 opened Mar 4, 2024 by ddoyles updated Mar 4, 2024

When I set per_device_train_batch_size=2, the S2-Attn would not shift as expected

#182 opened Mar 1, 2024 by linhaojia13 updated Mar 4, 2024

HF models missing rope scaling in the config

#181 opened Feb 29, 2024 by hsiehjackson updated Feb 29, 2024

Machine don't install Flash Attention

#180 opened Feb 27, 2024 by huilong-chen updated Feb 28, 2024

global_step文件

#179 opened Feb 21, 2024 by xxcoco763 updated Feb 21, 2024

Regarding the results in Table 8 and Table 14

#177 opened Feb 4, 2024 by Statisticss updated Feb 4, 2024

About the different datasets and corresponding models

#176 opened Feb 2, 2024 by Statisticss updated Feb 2, 2024

Memory usage "too small" for 7B Llama-2

#174 opened Jan 24, 2024 by Linohong updated Jan 24, 2024

training a LLM w/ shifted sparse attention from the scratch?

#173 opened Jan 24, 2024 by we1k updated Jan 24, 2024

merge_lora_weights_and_save_hf_model.py Error while deserializing header: HeaderTooLarge

#172 opened Jan 23, 2024 by Spongeorge updated Jan 23, 2024

Distributed inference issue

#171 opened Jan 22, 2024 by yixliu1 updated Jan 22, 2024

LongLoRA + Flash Attention 2 causing illigal memory access

#148 opened Nov 21, 2023 by ArturNiederfahrenhorst updated Jan 19, 2024

Is it possible to increase the context length of phi-2 using LongLora? If yes, what changes need to be done to support it?

#169 opened Jan 18, 2024 by dbanka updated Jan 19, 2024

论文中的evaluate结果，推理时用的attention是shifted sparse attention？还是full attention？

#170 opened Jan 19, 2024 by zhangxiann updated Jan 19, 2024

Previous 1 2 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2025-01-20.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly