Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2021 Feedback and Suggestions #14

Open
8 of 12 tasks
ukoethe opened this issue May 16, 2021 · 3 comments
Open
8 of 12 tasks

2021 Feedback and Suggestions #14

ukoethe opened this issue May 16, 2021 · 3 comments

Comments

@ukoethe
Copy link
Owner

ukoethe commented May 16, 2021

General

  • Is it fair to have the grade depend on just a (small) part of the lecture?
  • Force the students to engage with the project sooner, not in the last minute
  • Have a preparatory homework on RL, and have the RL lectures earlier in the semester, and perhaps more of them.
  • Provide agents from past years as test competitors (shortly before submission?) or another form of benchmarking for students to decide more easily if an approach is worthwhile pursuing.
  • Allow teams to submit two agents (one safe bet using straightforward ML methods, one more fancy using DeepRL).

Environment

  • Automate acceptance testing via github CI
  • Environment could provide:
    • a feature telling which agent won the round
    • generally more information about the opponents (e.g. their current score)
    • possibility to switch between training and evaluation more easily, e.g. using callable events without training
    • initialization at other than the standard starting state -> Add custom --scenario and modify build_arena
    • adjustable crate density, board size, and starting corner for training via command line option
    • possibility to pause an episode for inspection and later resume (instead of restart) -> use step debugging
    • or: mention in the instructions that such things can be implemented during method development and should be undone for the final training/testing
  • Potential bugs:
    • Environment should call the function game_events_occured() also for the last step before the game ends.
    • "We observed, that the crate distribution is not completely random and that in general
      fewer crates are placed in the bottom right corner. Thereby, more free tiles can be found
      on the bottom right, giving the agent, that starts in this corner, an advantage because the
      probability to kill itself is considerably smaller. Our agent, for example, can play the game
      very well, when starting on the bottom right, but is rather bad when starting elsewhere."
  • Modify environment for training
    • shorter thinking intervals or other speed-ups
    • switch-off multi-threading for easier debugging and profiling
    • provide a passive environment that the agent can call (instead of the other way around)
      • allows easy parallel execution of several environments for faster training
      • GUI and logging are not needed during training
      • makes the training procedure compatible with TFPyEnvironment in https://github.com/tensorflow/agents, or use gym for compatibility with keras_rl
      • example implementation (files items_fast.py, environment_fast.py, agents_fast.py, bomberman_adapter.py) in https://gitlab.com/koetherminator/fml-project
  • Use IntEnums instead of strings to speed-up comparisons (?)
  • Be closer to the original version of the game (e.g. drop several bombs simultaneously)

Project instructions

  • Remind students that the University logo must not be used in the report.
  • Specify in more detail the grading criteria and requirements.
  • Most articles in RL are about neural networks => point out the pre-NN literature (this would also put the project more in line with the rest of the lecture) and other recommended reading (e.g. about reward shaping).
  • Split assignment into more fine-grained subtasks, e.g. task 1a: free coins under fixed crates
  • Add more documentation about the game environment
    • Describe bomb behavior accurately (bombs are only dangerous for one step!).
    • Collect crucial information (e.g. adjustable parameters, required Python version, number of coins created) in a table
    • Explain that self-play is best realized by multiple copies of the same agent (possibly plus some randomization to make behavior more diverse).
  • Clarify that the environment can (and should!) be changed for debugging, profiling, and training -- just don't forget to undo the changes later on
    • timeout may be set to "infinity" to avoid interference with the debugger
    • board size, crate density etc. can be changed to create additional intermediate tasks
    • implement stop-and-resume for inspection
  • Explain symmetry of the game
    • reduce search space by exploiting symmetries
    • implement reward asymmetries to avoid undecided agents in symmetric situations
  • Generally, give a few more tips on promising approaches.
  • Provide Latex template for the report
  • Make more clear that a mentoring tutor is available for questions

Hardware

  • Google Colab difficulties:
    • default Python version is only 3.7 => lots of extra work to install everything from scratch every time
    • only 2 hours of consecutive computing time
@fdraxler
Copy link
Collaborator

I updated the main routines of the program. Step debugging and error raising is now transparent (no more threading, errors are just raised unless suppressed).

@fdraxler
Copy link
Collaborator

The crate distribution was already homogeneous.

@fdraxler
Copy link
Collaborator

I added the missing call to game_events_occurred().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants