Tencent / TencentPretrain Public

Notifications You must be signed in to change notification settings
Fork 142
Star 1k

Code
Issues 31
Pull requests 11
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights

Issues: Tencent/TencentPretrain

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

31 Open 5 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

SMP2020-EWECT的数据集开源使用问题

#133 opened Oct 31, 2024 by pxaklbe

请问这里的中文模型支持的最大输入序列长度是512tokens吗？超过512tokens就会被截断嘛？可不可以在微调的时候扩大模型的位置编码数量？

#130 opened Jul 21, 2024 by chengzi-big

单机2卡预训练LLAMA-7B报错TypeError: an integer is required (got type NoneType)

#112 opened Nov 29, 2023 by smallYellowCat

【问题】deepspeed如何对不同显存大小分配数据，我有32G和16G两种大小的GPU

#98 opened Sep 26, 2023 by 18liumin

LLaMA2-70B格式转换

#96 opened Sep 5, 2023 by Double-bear

size mismatch for classifier.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).

#93 opened Aug 19, 2023 by beat4ocean

KeyError: 'd'

#92 opened Aug 19, 2023 by beat4ocean

Resume from checkpoint

#81 opened Aug 15, 2023 by mohammadaminabbasi

发现几个bug，dynamic_masking 意思是不是写反了，多了not

#80 opened Aug 10, 2023 by boluoyu

lora推理

#71 opened Jun 6, 2023 by aSmallsheep

DeepSpeedZeRoOffload initialization failed (can't allocate memory)

#70 opened May 25, 2023 by treya-lin

关于扩充词表再增量预训练的疑问

#68 opened May 19, 2023 by qiancheng99

关于LLAMA预训练的疑问

#66 opened May 17, 2023 by zhanghaok

有计划支持torch2.0的flashattention这些吗

#61 opened May 11, 2023 by zhuojianc

训练完怎么执行推理，发现训练保存的模型只有17M

#59 opened May 4, 2023 by jiangjingyao

似乎没有保存rotary_emb.inv_freq的权重

#58 opened May 2, 2023 by cristianoc20

pretrain.py of llama-7b model, Exception: : Current loss scale already at minimum

#57 opened Apr 26, 2023 by liukaiyueyuo

单机8卡A100-80G deepspeed ZERO3 或者非ZERO3 pretrain LLaMA-7B时，不能充分利用显卡

#56 opened Apr 25, 2023 by ShadowTeamCN

支持bloom模型的预训练吗

#53 opened Apr 18, 2023 by nishiwen1214

llama训练时，best状态存储导致训练卡顿，建议删除存储best文件部分代码，望记得更新

#52 opened Apr 16, 2023 by baketbek

lora训练llama 貌似不支持？

#51 opened Apr 15, 2023 by wind91725

Any plan of supporting Bloom model?

#50 opened Apr 15, 2023 by wind91725

多卡跑llama预训练出错：RuntimeError: CUDA error: invalid resource handle，Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

#43 opened Apr 11, 2023 by baketbek

lora貌似不支持ddp？

#41 opened Apr 11, 2023 by wutaiqiang

用自己的中文数据的话，preprocess中需要把数据格式调整成什么形式即可？这部分相关说明有吗？目标是想做llama的增量预训练

#40 opened Apr 10, 2023 by baketbek

Previous 1 2 Next

Previous Next

ProTip! Updated in the last three days: updated:>2024-11-01.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly