手把手教你低成本微调 InternLM-20B/7B 模型专题(撰写中) #153
Replies: 7 comments 6 replies
-
手把手教程(1)环境配置在正式开始前,我们首先运行 cd home_workspace
export HOME_WORKSPACE=$(pwd)
echo $HOME_WORKSPACE 接下来我们所有的操作都是在 创建虚拟环境 conda create -n xtuner python=3.10
conda activate xtuner # 注意以后的所有操作都需要在 xtuner 的虚拟环境中进行操作 在虚拟环境中安装 PyTorch conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia 源码安装 XTuner cd $HOME_WORKSPACE
git clone https://gitee.com/internlm/xtuner.git -b v0.1.4
# git clone https://github.com/InternLM/xtuner.git
cd xtuner
pip install -e '.[all]' 以上命令是从国内的 gitee 下载 xtuner 的源码的不会存在网络问题 |
Beta Was this translation helpful? Give feedback.
-
手把手教程(2)基座模型下载从 Hugging Face 和 ModelScope 下载的基座模型文件是对齐的,我们只需要选择一个合适的方法下载就行 从 modelscope 下载方法一: git 命令下载# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
# 下载 internlm-7b 模型,下载下来磁盘占用 28GB
git clone --depth 1 https://www.modelscope.cn/Shanghai_AI_Laboratory/internlm-7b.git
# 下载 internlm-20b 模型
git clone --depth 1 https://www.modelscope.cn/Shanghai_AI_Laboratory/internlm-20b.git 注意事项:在执行 方法二: 通过 modelscope 的 python 库下载pip install modelscope from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM
model_dir = snapshot_download("Shanghai_AI_Laboratory/internlm-7b", revision='v1.0.0') 从 hugging face 下载方法一: git 命令下载git clone --depth 1 https://huggingface.co/internlm/internlm-7b
git clone --depth 1 https://huggingface.co/internlm/internlm-20b 方法二: 通过 hugging face 的 python 库下载# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("internlm/internlm-7b", trust_remote_code=True) |
Beta Was this translation helpful? Give feedback.
-
手把手教程(3)微调数据集制作请新建 [
{
"conversation": [
{
"input": "请介绍一下 XTuner",
"output": "XTuner 是上海AI实验室书生·浦语大模型开源工具链中的轻量级低成本微调工具"
}
]
},
{
"conversation": [
{
"input": "请介绍一下 XTuner",
"output": "XTuner 是上海AI实验室书生·浦语大模型开源工具链中的轻量级低成本微调工具"
}
]
}
] |
Beta Was this translation helpful? Give feedback.
-
手把手教程(4)微调训练创建训练配置文件创建 # Copyright (c) OpenMMLab. All rights reserved.
import torch
from bitsandbytes.optim import PagedAdamW32bit
from datasets import load_dataset
from mmengine.dataset import DefaultSampler
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
LoggerHook, ParamSchedulerHook)
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR
from peft import LoraConfig
from transformers import (AutoModelForCausalLM, AutoTokenizer,
BitsAndBytesConfig)
from xtuner.dataset import process_hf_dataset
from xtuner.dataset.collate_fns import default_collate_fn
from xtuner.dataset.map_fns import alpaca_zh_map_fn, template_map_fn_factory
from xtuner.engine import DatasetInfoHook, EvaluateChatHook
from xtuner.model import SupervisedFinetune
from xtuner.utils import PROMPT_TEMPLATE
#######################################################################
# PART 1 Settings #
#######################################################################
# Model
pretrained_model_name_or_path = '../internlm-7b'
# Data
alpaca_zh_path = 'data/internlm-assistant/single_turn_data.json'
prompt_template = PROMPT_TEMPLATE.alpaca
max_length = 2048
pack_to_max_length = True
# Scheduler & Optimizer
batch_size = 1 # per_device
accumulative_counts = 16
dataloader_num_workers = 0
max_epochs = 3
optim_type = PagedAdamW32bit
lr = 2e-4
betas = (0.9, 0.999)
weight_decay = 0
max_norm = 1 # grad clip
# Evaluate the generation performance during the training
evaluation_freq = 500
evaluation_inputs = [
'请介绍一下 XTuner'
]
#######################################################################
# PART 2 Model & Tokenizer #
#######################################################################
tokenizer = dict(
type=AutoTokenizer.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
padding_side='right')
model = dict(
type=SupervisedFinetune,
llm=dict(
type=AutoModelForCausalLM.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
torch_dtype=torch.float16,
quantization_config=dict(
type=BitsAndBytesConfig,
load_in_4bit=True,
load_in_8bit=False,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4')),
lora=dict(
type=LoraConfig,
r=64,
lora_alpha=16,
lora_dropout=0.1,
bias='none',
task_type='CAUSAL_LM'))
#######################################################################
# PART 3 Dataset & Dataloader #
#######################################################################
alpaca_zh = dict(
type=process_hf_dataset,
dataset=dict(type=load_dataset, path='json', data_files= dict(train=alpaca_zh_path)),
tokenizer=tokenizer,
max_length=max_length,
# dataset_map_fn=alpaca_zh_map_fn,
template_map_fn=dict(
type=template_map_fn_factory, template=prompt_template),
remove_unused_columns=True,
shuffle_before_pack=True,
pack_to_max_length=pack_to_max_length)
train_dataloader = dict(
batch_size=batch_size,
num_workers=dataloader_num_workers,
dataset=alpaca_zh,
sampler=dict(type=DefaultSampler, shuffle=True),
collate_fn=dict(type=default_collate_fn))
#######################################################################
# PART 4 Scheduler & Optimizer #
#######################################################################
# optimizer
optim_wrapper = dict(
type=AmpOptimWrapper,
optimizer=dict(
type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
accumulative_counts=accumulative_counts,
loss_scale='dynamic',
dtype='float16')
# learning policy
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501
param_scheduler = dict(
type=CosineAnnealingLR,
eta_min=lr * 0.1,
by_epoch=True,
T_max=max_epochs,
convert_to_iter_based=True)
# train, val, test setting
train_cfg = dict(by_epoch=True, max_epochs=max_epochs, val_interval=1)
#######################################################################
# PART 5 Runtime #
#######################################################################
# Log the dialogue periodically during the training process, optional
custom_hooks = [
dict(type=DatasetInfoHook, tokenizer=tokenizer),
dict(
type=EvaluateChatHook,
tokenizer=tokenizer,
every_n_iters=evaluation_freq,
evaluation_inputs=evaluation_inputs,
instruction=prompt_template.INSTRUCTION_START)
]
# configure default hooks
default_hooks = dict(
# record the time of every iteration.
timer=dict(type=IterTimerHook),
# print log every 100 iterations.
logger=dict(type=LoggerHook, interval=10),
# enable the parameter scheduler.
param_scheduler=dict(type=ParamSchedulerHook),
# save checkpoint per epoch.
checkpoint=dict(type=CheckpointHook, interval=1),
# set sampler seed in distributed evrionment.
sampler_seed=dict(type=DistSamplerSeedHook),
)
# configure environment
env_cfg = dict(
# whether to enable cudnn benchmark
cudnn_benchmark=False,
# set multi process parameters
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
# set distributed parameters
dist_cfg=dict(backend='nccl'),
)
# set visualizer
visualizer = None
# set log level
log_level = 'INFO'
# load from which checkpoint
load_from = None
# whether to resume training from the loaded checkpoint
resume = False
# Defaults to use random seed and disable `deterministic`
randomness = dict(seed=None, deterministic=False) 启动训练xtuner train configs/internlm_7b_qlora_alpaca_internlm_assistant.py |
Beta Was this translation helpful? Give feedback.
-
手把手教程(5)模型转换pth 转 Hugging Face 格式
Merge 权重export NAME_OR_PATH_TO_LLM=/root/llm/internlm-7b
export NAME_OR_PATH_TO_ADAPTER=work_dirs/internlm_7b_qlora_alpaca_internlm_assistant/hf
export SAVE_PATH=work_dirs/internlm_7b_qlora_alpaca_internlm_assistant/hf_merge
xtuner convert merge \
$NAME_OR_PATH_TO_LLM \
$NAME_OR_PATH_TO_ADAPTER \
$SAVE_PATH \
--max-shard-size 2GB |
Beta Was this translation helpful? Give feedback.
-
手把手教程(6)Hugging Face 模型直接推理 |
Beta Was this translation helpful? Give feedback.
-
手把手教程(7) LMDeploy 模型部署在执行下面的操作前,先安装 LMDeploy pip install lmdeploy 1. Hugging Face 格式转换为 TurboMind 格式cd $HOME_WORKSPACE
python -m lmdeploy.serve.turbomind.deploy internlm $HOME_WORKSPACE/internlm-7b
2. 直接部署3. 4 bit 权重量化并部署安装操作参照文档操作,生成量化参数 export HF_MODEL=$HOME_WORKSPACE/internlm-7b
cd $HOME_WORKSPACE
python3 -m lmdeploy.lite.apis.calibrate \
--model $HF_MODEL \
--calib_dataset 'c4' \
--calib_samples 128 \
--calib_seqlen 1024 \
--work_dir $HOME_WORKSPACE/w4a16_workdir 注意:以上的命令需要能够连接上 Hugging Face,下载allenai/c4数据集进行验证 执行命令后如下表所示生成了 (xtuner) ➜ w4a16_workdir tree .
.
├── inputs_stats.pth
├── key_stats.pth
├── outputs_stats.pth
└── value_stats.pth LMDeploy 使用 AWQ 算法对模型权重进行量化。在执行下面的命令时,需要把步骤1的 work_dir 传入。量化结束后,权重文件也会存放在这个目录中。 export HF_MODEL=$HOME_WORKSPACE/internlm-7b
python3 -m lmdeploy.lite.apis.auto_awq \
--model $HF_MODEL \
--w_bits 4 \
--w_group_size 128 \
--work_dir $HOME_WORKSPACE/w4a16_workdir (xtuner) ➜ w4a16_workdir tree .
.
├── config.json
├── configuration_internlm.py
├── generation_config.json
├── inputs_stats.pth
├── key_stats.pth
├── modeling_internlm.py
├── outputs_stats.pth
├── pytorch_model.bin
├── special_tokens_map.json
├── tokenization_internlm.py
├── tokenizer_config.json
├── tokenizer.model
└── value_stats.pth |
Beta Was this translation helpful? Give feedback.
-
帮助社区小伙伴疏通 境配置、基座模型下载、数据集制作、模型转换各个环节的阻碍
手把手低成本微调教程(1)环境配置
手把手低成本微调教程(2)基座模型下载
手把手低成本微调教程(3)微调数据集制作
手把手低成本微调教程(4)微调训练
手把手低成本微调教程(5)模型转换
手把手低成本微调教程(6)模型部署
手把手低成本微调教程(7) LMDeploy 模型部署
本教程在 Ubuntu/CentOS 等 Linux 系统能稳定复现,Windows 用户请根据教程灵活调整命令
效果
InternLM 微调前: 不会回答请介绍一下 XTuner 这个问题
微调后:
Beta Was this translation helpful? Give feedback.
All reactions