Skip to content

Adversarial auxiliary signals #13

@deontologician

Description

@deontologician

Create two separate networks that compete to explore the environment (together they form 1 agent)
Idea is to have a reinforcement learning setup where:

  • The prediction network learns an unsupervised representation of the environment, and predicts what will happen next
    • We could use adversarial techniques for unsupervised learning, or we could use something less fancy like denoising autoencoders
  • The exploration network controls the actions of the agent, and gets a reward proportional to the MSE of the prediction network's prediction and reality
    • This is an artificial reward signal, not tied to the true environment reward

The exploration network has no backprop into the weights of the prediction network, so it can't suggest degenerate representations (e.g. learning to output random noise to maximize surprise).

Influence is solely through the actions of the exploration network causing mispredictions. e.g. reality is always in between the exploration network and the prediction network

Considerations:

  • The exploration network needs to quickly adapt to changing dynamics (model this like a multi-arm bandit that periodically changes the payout probabilities of the arms). Things like RL^2 are probably a good idea here.
  • The inputs to the exploration network might need to be the raw input, and maybe some memory like an LSTM

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions