You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ACTOR_MODEL_PATH=$1
CRITIC_MODEL_PATH=$2
ACTOR_ZERO_STAGE=$3
CRITIC_ZERO_STAGE=$4
OUTPUT=$5
if [ "$OUTPUT" == "" ]; then
OUTPUT=./output
fi
if [ "$ACTOR_ZERO_STAGE" == "" ]; then
ACTOR_ZERO_STAGE=3
fi
if [ "$CRITIC_ZERO_STAGE" == "" ]; then
CRITIC_ZERO_STAGE=3
fi
if [ "$ACTOR_MODEL_PATH" == "" ]; then
ACTOR_MODEL_PATH=AdamG012/chat-opt-1.3b-sft-deepspeed
fi
if [ "$CRITIC_MODEL_PATH" == "" ]; then
CRITIC_MODEL_PATH=AdamG012/chat-opt-350m-reward-deepspeed
fi
The log is below:
--- prompt --> step=272, rank=1, ['\n\nHuman: How can I train for running a marathon?\n\nAssistant: Gosh! I guess I could give you lots of very detailed advice, but I’m not sure that’s the best idea. That’s a pretty rigorous training program! If you want to check in with me every few weeks, I could share some of the ideas that might be helpful in your training. Do you have a pace in mind that you’re trying to get to?\n\nHuman: No, I just want to be prepared and get in better shape. The marathon is about 4 months from now.\n\nAssistant: Maybe focus on just running more regularly for now? If you just get in the habit of running, you’ll start feeling stronger and faster, and once you get used to it, the distance of a marathon will feel relatively easy.\n\nHuman: That makes sense. I will improve my conditioning if I just make it a habit to run every day.\n\nAssistant:']
--- prompt --> step=272, rank=0, ['\n\nHuman: WHat can I use witchhazel for?\n\nAssistant: It’s a multipurpose natural remedy that’s also a common household product. It’s used to soothe sore muscles and joints, as well as a facial wash, mouthwash, and hair rinse. Some people use it topically for inflammation and itching. And it’s also used in many natural cleaning products and body care products.\n\nHuman: How do you use it for sore muscles?\n\nAssistant:']--- prompt --> step=272, rank=3, ['\n\nHuman: How can I stop my vomiting bout after food poisoning?\n\nAssistant: I’m sorry you’ve been feeling sick. Is there anything you think you can do to keep vomiting? Would eating small frequent meals work?\n\nHuman: Oh, that would probably work. What should I be eating?\n\nAssistant: Any food is a good choice, to keep you from getting too hungry or dehydrated. You could try eating mostly salty foods like broth, juice, soda, and bread.\n\nHuman: Are you sure soda is a good idea?\n\nAssistant:']
--- prompt --> step=272, rank=2, ['\n\nHuman: Please tell me how to make brownies.\n\nAssistant:']
--- ans --> step=272, rank=1, [' I<|endoftext|>']
--- ans --> step=272, rank=3, ['<|endoftext|>']
--- ans --> step=272, rank=0, ['<|endoftext|>']
--- ans --> step=272, rank=2, ['<|endoftext|>']
Epoch: 0 | Step: 272 | PPO Epoch: 1 | Actor Loss: -2.625 | Critic Loss: 3.69140625 | Unsupervised Loss: 0.0
End-to-End => Latency: 3.25s, TFLOPs: 2.03, Samples/sec: 1.23, Time/seq 0.81s, Batch Size: 4, Total Seq. Length: 512
Generation => Latency: 1.97s, Per-token Latency 7.71 ms, TFLOPs: 0.69, BW: 341.33 GB/sec, Answer Seq. Length: 256
Training => Latency: 1.27s, TFLOPs: 4.10
Actor Model Parameters => 1.316 B, Critic Model Parameters => 0.331 B
Average reward score: -11.5546875 | EMA reward score: -11.419149377686367
The text was updated successfully, but these errors were encountered:
settings:
actor & critic: OPT 1.3b
reward model: OPT 350m
GPU: 4 * V100 32G
running script:
ACTOR_MODEL_PATH=$1
CRITIC_MODEL_PATH=$2
ACTOR_ZERO_STAGE=$3
CRITIC_ZERO_STAGE=$4
OUTPUT=$5
if [ "$OUTPUT" == "" ]; then
OUTPUT=./output
fi
if [ "$ACTOR_ZERO_STAGE" == "" ]; then
ACTOR_ZERO_STAGE=3
fi
if [ "$CRITIC_ZERO_STAGE" == "" ]; then
CRITIC_ZERO_STAGE=3
fi
if [ "$ACTOR_MODEL_PATH" == "" ]; then
ACTOR_MODEL_PATH=AdamG012/chat-opt-1.3b-sft-deepspeed
fi
if [ "$CRITIC_MODEL_PATH" == "" ]; then
CRITIC_MODEL_PATH=AdamG012/chat-opt-350m-reward-deepspeed
fi
echo "Step3: ACTOR_MODEL_PATH=$ACTOR_MODEL_PATH CRITIC_MODEL_PATH=$CRITIC_MODEL_PATH ACTOR_ZERO_STAGE=$ACTOR_ZERO_STAGE CRITIC_ZERO_STAGE=$CRITIC_ZERO_STAGE OUTPUT=$OUTPUT"
mkdir -p $OUTPUT
Num_Padding_at_Beginning=1 # this is model related
Actor_Lr=9.65e-6
Critic_Lr=5e-6
deepspeed --master_port 12346 main.py
--data_path Dahoas/rm-static
--data_split 2,4,4
--actor_model_name_or_path $ACTOR_MODEL_PATH
--critic_model_name_or_path $CRITIC_MODEL_PATH
--num_padding_at_beginning 1
--per_device_generation_batch_size 1
--per_device_training_batch_size 1
--generation_batches 1
--ppo_epochs 1
--max_answer_seq_len 256
--max_prompt_seq_len 256
--actor_learning_rate ${Actor_Lr}
--critic_learning_rate ${Critic_Lr}
--num_train_epochs 2
--lr_scheduler_type cosine
--gradient_accumulation_steps 1
--actor_dropout 0.0
--num_warmup_steps 100
--deepspeed --seed 1234
--actor_zero_stage $ACTOR_ZERO_STAGE
--critic_zero_stage $CRITIC_ZERO_STAGE
--enable_ema
--output_dir $OUTPUT
--enable_tensorboard
--tensorboard_path $OUTPUT
&> $OUTPUT/training.log
The log is below:
--- prompt --> step=272, rank=1, ['\n\nHuman: How can I train for running a marathon?\n\nAssistant: Gosh! I guess I could give you lots of very detailed advice, but I’m not sure that’s the best idea. That’s a pretty rigorous training program! If you want to check in with me every few weeks, I could share some of the ideas that might be helpful in your training. Do you have a pace in mind that you’re trying to get to?\n\nHuman: No, I just want to be prepared and get in better shape. The marathon is about 4 months from now.\n\nAssistant: Maybe focus on just running more regularly for now? If you just get in the habit of running, you’ll start feeling stronger and faster, and once you get used to it, the distance of a marathon will feel relatively easy.\n\nHuman: That makes sense. I will improve my conditioning if I just make it a habit to run every day.\n\nAssistant:']
--- prompt --> step=272, rank=0, ['\n\nHuman: WHat can I use witchhazel for?\n\nAssistant: It’s a multipurpose natural remedy that’s also a common household product. It’s used to soothe sore muscles and joints, as well as a facial wash, mouthwash, and hair rinse. Some people use it topically for inflammation and itching. And it’s also used in many natural cleaning products and body care products.\n\nHuman: How do you use it for sore muscles?\n\nAssistant:']--- prompt --> step=272, rank=3, ['\n\nHuman: How can I stop my vomiting bout after food poisoning?\n\nAssistant: I’m sorry you’ve been feeling sick. Is there anything you think you can do to keep vomiting? Would eating small frequent meals work?\n\nHuman: Oh, that would probably work. What should I be eating?\n\nAssistant: Any food is a good choice, to keep you from getting too hungry or dehydrated. You could try eating mostly salty foods like broth, juice, soda, and bread.\n\nHuman: Are you sure soda is a good idea?\n\nAssistant:']
--- prompt --> step=272, rank=2, ['\n\nHuman: Please tell me how to make brownies.\n\nAssistant:']
--- ans --> step=272, rank=1, [' I<|endoftext|>']
--- ans --> step=272, rank=3, ['<|endoftext|>']
--- ans --> step=272, rank=0, ['<|endoftext|>']
--- ans --> step=272, rank=2, ['<|endoftext|>']
Epoch: 0 | Step: 272 | PPO Epoch: 1 | Actor Loss: -2.625 | Critic Loss: 3.69140625 | Unsupervised Loss: 0.0
End-to-End => Latency: 3.25s, TFLOPs: 2.03, Samples/sec: 1.23, Time/seq 0.81s, Batch Size: 4, Total Seq. Length: 512
Generation => Latency: 1.97s, Per-token Latency 7.71 ms, TFLOPs: 0.69, BW: 341.33 GB/sec, Answer Seq. Length: 256
Training => Latency: 1.27s, TFLOPs: 4.10
Actor Model Parameters => 1.316 B, Critic Model Parameters => 0.331 B
Average reward score: -11.5546875 | EMA reward score: -11.419149377686367
The text was updated successfully, but these errors were encountered: