Deep RL Trader + PPO Agent Implemented using Tensorforce

This repo contains

Trading environment(OpenAI Gym) + Wrapper for Tensorforce Env
PPO(Proximal Policy Optimization) Agent (https://arxiv.org/abs/1707.06347) Agent is implemented using tensorforce(https://github.com/reinforceio/tensorforce)

Agent is expected to learn useful action sequences to maximize profit in a given environment.
Environment limits agent to either buy, sell, hold stock(coin) at each step.
If an agent decides to take a

LONG position it will initiate sequence of action such as buy- hold- hold- sell
for a SHORT position vice versa (e.g.) sell - hold -hold -buy.

Only a single position can be opened per trade.

Thus invalid action sequence like buy - buy will be considered buy- hold.
Default transaction fee is : 0.0005

Reward is given

when the position is closed or
an episode is finished.

This type of sparse reward granting scheme takes longer to train but is most successful at learning long term dependencies.

Agent decides optimal action by observing its environment.

Trading environment will emit features derived from ohlcv-candles(the window size can be configured).
Thus, input given to the agent is of the shape (window_size, n_features).

With some modification it can easily be applied to stocks, futures or foregin exchange as well.

Visualization / Main / Environment

Sample data provided is 5min ohlcv candle fetched from bitmex.

train : './data/train/ 70000
test : './data/train/ 16000

Prerequisites

keras-rl, numpy, tensorflow ... etc

pip install -r requirements.txt

Getting Started

Create Environment & Agent

# create environment
# OPTIONS
# create environment for train and test
PATH_TRAIN = "./data/train/"
PATH_TEST = "./data/test/"
TIMESTEP = 30  # window size
environment = create_btc_env(window_size=TIMESTEP, path=PATH_TRAIN, train=True)
test_environment = create_btc_env(window_size=TIMESTEP, path=PATH_TEST, train=False)

# create spec for network and baseline
network_spec = create_network_spec() # json format
baseline_spec = create_baseline_spec()

# create agent
agent = PPOAgent(
    discount=0.9999,
    states=environment.states,
    actions=environment.actions,
    network=network_spec,
    # Agent
    states_preprocessing=None,
    actions_exploration=None,
    reward_preprocessing=None,
    # MemoryModel
    update_mode=dict(
        unit='timesteps',  # 'episodes',
        # 10 episodes per update
        batch_size=32,
        # # Every 10 episodes
        frequency=10
    ),
    memory=dict(
        type='latest',
        include_next_states=False,
        capacity=50000
    ),
    # DistributionModel
    distributions=None,
    entropy_regularization=0.0,  # None
    # PGModel

    baseline_mode='states',
    baseline=dict(type='custom', network=baseline_spec),
    baseline_optimizer=dict(
        type='multi_step',
        optimizer=dict(
            type='adam',
            learning_rate=(1e-4)  # 3e-4
        ),
        num_steps=5
    ),
    gae_lambda=0,  # 0
    # PGLRModel
    likelihood_ratio_clipping=0.2,
    # PPOAgent
    step_optimizer=dict(
        type='adam',
        learning_rate=(1e-4)  # 1e-4
    ),
    subsampling_fraction=0.2,  # 0.1
    optimization_steps=10,
    execution=dict(
        type='single',
        session_config=None,
        distributed_spec=None
    )
)

Train and Validate

    train_runner = Runner(agent=agent, environment=environment)
    test_runner = Runner(
        agent=agent,
        environment=test_environment,
    )

    train_runner.run(episodes=100, max_episode_timesteps=16000, episode_finished=episode_finished)
    print("Learning finished. Total episodes: {ep}. Average reward of last 100 episodes: {ar}.".format(
        ep=train_runner.episode,
        ar=np.mean(train_runner.episode_rewards[-100:]))
    )

    test_runner.run(num_episodes=1, deterministic=True, testing=True, episode_finished=print_simple_log)

Configuring Agent

## you can stack layers using blocks provided by tensorforce or define ur own...
def create_network_spec():
    network_spec = [
        {
            "type": "flatten"
        },
        dict(type='dense', size=32, activation='relu'),
        dict(type='dense', size=32, activation='relu'),
        dict(type='internal_lstm', size=32),
    ]
    return network_spec

def create_baseline_spec():
    baseline_spec = [
        {
            "type": "lstm",
            "size": 32,
        },
        dict(type='dense', size=32, activation='relu'),
        dict(type='dense', size=32, activation='relu'),
    ]
    return baseline_spec

Running

[Verbose] While training or testing,

environment will print out (current_tick , # Long, # Short, Portfolio)

[Portfolio]

initial portfolio starts with 100*10000(krw-won)
reflects change in portfolio value if the agent had invested 100% of its balance every time it opened a position.

[Reward]

simply pct earning per trade.

Inital Result

Portfolio Value Change, Max DrawDown period in Red

portfolio value 1000000 -> 1586872.1775 in 56 days

Not bad but the agent definitely needs more

training data and
degree of freedom (larger network)

Beaware of overfitting !

Authors

Lee Hankyol - Initial work - tf_deep_rl_trader

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
data		data
env		env
.gitattributes		.gitattributes
Readme.md		Readme.md
portfolio_change.png		portfolio_change.png
ppo_trader.py		ppo_trader.py
process_data.py		process_data.py
requirements.txt		requirements.txt
ta.py		ta.py
visualize_info.ipynb		visualize_info.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep RL Trader + PPO Agent Implemented using Tensorforce

Prerequisites

Getting Started

Create Environment & Agent

Train and Validate

Configuring Agent

Running

Inital Result

Portfolio Value Change, Max DrawDown period in Red

Authors

License

About

Releases

Packages

Languages

Whiplash-18/tf_deep_rl_trader

Folders and files

Latest commit

History

Repository files navigation

Deep RL Trader + PPO Agent Implemented using Tensorforce

Prerequisites

Getting Started

Create Environment & Agent

Train and Validate

Configuring Agent

Running

Inital Result

Portfolio Value Change, Max DrawDown period in Red

Authors

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages