Reward modification in PPO #7

Ynjxsjmh · 2021-02-21T04:26:11Z

Lines 151 to 154 in 876266d

    
           state_batch.append(state) 
        
           action_batch.append(action) 
        
           reward_batch.append(reward * 0.01) 
        
           old_policy_batch.append(probs)

DeepRL-TensorFlow2/PPO/PPO_Continuous.py

Lines 167 to 170 in 876266d

    
           state_batch.append(state) 
        
           action_batch.append(action) 
        
           reward_batch.append((reward+8)/8) 
        
           old_policy_batch.append(log_old_policy)

In PPO_Discrete each reward is multiplied by 0.01 and in PPO_Continuous reward is also modified. I don't understand why do these modification, what does these modification do?

The text was updated successfully, but these errors were encountered:

ghost · 2021-02-25T03:28:04Z

same question

huojitiaotiao · 2023-03-08T08:43:54Z

乘0.01应该是减小奖励，使其保持在0-1之间（我猜测）

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward modification in PPO #7

Reward modification in PPO #7

Ynjxsjmh commented Feb 21, 2021

ghost commented Feb 25, 2021

huojitiaotiao commented Mar 8, 2023

Reward modification in PPO #7

Reward modification in PPO #7

Comments

Ynjxsjmh commented Feb 21, 2021

ghost commented Feb 25, 2021

huojitiaotiao commented Mar 8, 2023