provide the docker file #7

merlintang · 2023-08-28T22:26:49Z

No description provided.

mikecovlee · 2023-08-29T00:55:40Z

I found that the runners provided by github action did not contains GPU capacity. Self-hosted runner can solve this problem if we need continuous integration.

merlintang · 2023-08-29T20:33:05Z

can @LianxinGao look at this , and provide the self container running in our dev env

LianxinGao · 2023-08-30T05:56:38Z

can @LianxinGao look at this , and provide the self container running in our dev env

@merlintang gpu_runner launched on gpu01 machine. @mikecovlee 4 gpu, number from 0 to 3. (4090: 0,2,3; 3090:1 )

mikecovlee · 2023-08-30T06:01:33Z

The main entrance mlora.py have finished in main branch, plz test it with llama-7b and demo dataset.

python mlora.py --base_model <path to llama-7b> --config ./config/finetune.json --load_8bit true

LianxinGao · 2023-08-30T06:40:37Z

python mlora.py --base_model --config ./config/finetune.json --load_8bit true

@mikecovlee Where to configure gpu device？ Not found in finetune.json

mikecovlee · 2023-08-30T06:43:51Z

Use --device argument. Default is cuda:0, ASPEN can utilize one GPU only.

LianxinGao · 2023-08-30T06:48:25Z

The main entrance mlora.py have finished in main branch, plz test it with llama-7b and demo dataset.
python mlora.py --base_model <path to llama-7b> --config ./config/finetune.json --load_8bit true

error @mikecovlee :

[2023-08-30 14:46:10] ASPEN: NVIDIA CUDA initialized successfully.
[2023-08-30 14:46:10] ASPEN: Total 3 GPU(s) detected.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:09<00:00,  4.57s/it]
to load text data from file.
load text data from file done.
to encode text data to tokens
encode text data: 0/2
encode text data: 0/2
encode text data to tokens done.
lora_0 train data:
    epoch: 1 / 3
    step : 0 / 2
lora_1 train data:
    epoch: 1 / 3
    step : 0 / 2
batch data size: 32 * 4
/opt/conda/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Traceback (most recent call last):
  File "/data/glx/code/multi_lora/mlora.py", line 157, in <module>
    train(config, model)
  File "/data/glx/code/multi_lora/mlora.py", line 127, in train
    :].contiguous().view(-1, llama_model.vocab_size_)
RuntimeError: shape '[-1, 32000]' is invalid for input of size 1984062

LianxinGao · 2023-08-30T08:40:45Z

@merlintang done #18

yezhengmao1 · 2023-08-30T12:18:39Z

The main entrance mlora.py have finished in main branch, plz test it with llama-7b and demo dataset.
python mlora.py --base_model <path to llama-7b> --config ./config/finetune.json --load_8bit true

error @mikecovlee :

[2023-08-30 14:46:10] ASPEN: NVIDIA CUDA initialized successfully.
[2023-08-30 14:46:10] ASPEN: Total 3 GPU(s) detected.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:09<00:00,  4.57s/it]
to load text data from file.
load text data from file done.
to encode text data to tokens
encode text data: 0/2
encode text data: 0/2
encode text data to tokens done.
lora_0 train data:
    epoch: 1 / 3
    step : 0 / 2
lora_1 train data:
    epoch: 1 / 3
    step : 0 / 2
batch data size: 32 * 4
/opt/conda/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Traceback (most recent call last):
  File "/data/glx/code/multi_lora/mlora.py", line 157, in <module>
    train(config, model)
  File "/data/glx/code/multi_lora/mlora.py", line 127, in train
    :].contiguous().view(-1, llama_model.vocab_size_)
RuntimeError: shape '[-1, 32000]' is invalid for input of size 1984062

can you check vicuna-7b vocab size ?

yezhengmao1 · 2023-08-30T13:07:52Z

@mikecovlee the vicuna-7b and llama-7b have different lm_head / embedding size, need to adapt.

mikecovlee · 2023-08-31T01:52:49Z

My tests are passed on both vicuna-7b and llama-7b-tf on current main branch

LianxinGao · 2023-08-31T02:13:57Z

@mikecovlee which version of vicuna7B are you using?

mikecovlee · 2023-08-31T02:16:18Z

@LianxinGao vicuna-7b-delta-v1.1

LianxinGao · 2023-08-31T02:30:23Z

@LianxinGao vicuna-7b-delta-v1.1

I'll change the version of vicuna in ci, and retest it

mikecovlee · 2023-08-31T02:39:07Z

You can directly commit to mikecovlee_dev branch which on a draft pull request fixing CI errors. @LianxinGao

mikecovlee · 2023-08-31T02:41:27Z

Now I split CI checks on GPU into to separate jobs. Tests on LLaMA-7B are passed while failed on Vicuna-7B. #21

mikecovlee · 2023-08-31T06:00:39Z

Plz check local model llama-7b-hf because the config file lack of max_sequence_length field.

Referring to CI runs. Local machine tests are passed.

@LianxinGao

mikecovlee · 2023-08-31T06:02:24Z

Btw, later commits plz create a new branch rather than mikecovlee_dev.

LianxinGao · 2023-08-31T06:19:11Z

Plz check local model llama-7b-hf because the config file lack of max_sequence_length field.

Referring to CI runs. Local machine tests are passed.

@LianxinGao

The models on the machine seems buggy😭，now all fixed....

yezhengmao1 · 2023-08-31T06:21:49Z

also found this, i will fix it.

merlintang · 2023-09-12T06:10:42Z

@LianxinGao can you send a pr with a docker file

LianxinGao · 2023-09-12T06:18:00Z

@LianxinGao can you send a pr with a docker file

ok，I'll do it.

merlintang assigned LianxinGao Aug 29, 2023

LianxinGao mentioned this issue Sep 12, 2023

Add docker file for developer #43

Merged

merlintang closed this as completed Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

provide the docker file #7

provide the docker file #7

merlintang commented Aug 28, 2023

mikecovlee commented Aug 29, 2023

merlintang commented Aug 29, 2023

LianxinGao commented Aug 30, 2023

mikecovlee commented Aug 30, 2023

LianxinGao commented Aug 30, 2023 •

edited

Loading

mikecovlee commented Aug 30, 2023

LianxinGao commented Aug 30, 2023

LianxinGao commented Aug 30, 2023

yezhengmao1 commented Aug 30, 2023

yezhengmao1 commented Aug 30, 2023

mikecovlee commented Aug 31, 2023

LianxinGao commented Aug 31, 2023

mikecovlee commented Aug 31, 2023

LianxinGao commented Aug 31, 2023

mikecovlee commented Aug 31, 2023

mikecovlee commented Aug 31, 2023

mikecovlee commented Aug 31, 2023

mikecovlee commented Aug 31, 2023

LianxinGao commented Aug 31, 2023

yezhengmao1 commented Aug 31, 2023

merlintang commented Sep 12, 2023

LianxinGao commented Sep 12, 2023

provide the docker file #7

provide the docker file #7

Comments

merlintang commented Aug 28, 2023

mikecovlee commented Aug 29, 2023

merlintang commented Aug 29, 2023

LianxinGao commented Aug 30, 2023

mikecovlee commented Aug 30, 2023

LianxinGao commented Aug 30, 2023 • edited Loading

mikecovlee commented Aug 30, 2023

LianxinGao commented Aug 30, 2023

LianxinGao commented Aug 30, 2023

yezhengmao1 commented Aug 30, 2023

yezhengmao1 commented Aug 30, 2023

mikecovlee commented Aug 31, 2023

LianxinGao commented Aug 31, 2023

mikecovlee commented Aug 31, 2023

LianxinGao commented Aug 31, 2023

mikecovlee commented Aug 31, 2023

mikecovlee commented Aug 31, 2023

mikecovlee commented Aug 31, 2023

mikecovlee commented Aug 31, 2023

LianxinGao commented Aug 31, 2023

yezhengmao1 commented Aug 31, 2023

merlintang commented Sep 12, 2023

LianxinGao commented Sep 12, 2023

LianxinGao commented Aug 30, 2023 •

edited

Loading