Skip to content

Commit bf96d4c

Browse files
committed
Drop az_quiz._winner, use az_quiz._outcome instead.
1 parent 6ba7ee7 commit bf96d4c

3 files changed

Lines changed: 37 additions & 9 deletions

File tree

labs/npfl139/board_games/az_quiz.py

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ def __init__(self, randomized=False):
1818
self._board = np.tri(self.N, dtype=np.int8) - 1
1919
self._randomized = randomized
2020
self._to_play = 0
21-
self._winner = None
21+
self._outcome = None
2222
self._screen = None
2323
self._last_action, self._winning_stones = None, None
2424

@@ -27,11 +27,11 @@ def clone(self, swap_players=False) -> "AZQuiz":
2727
if swap_players:
2828
clone._board = self._SWAP_PLAYERS[self._board + 1]
2929
clone._to_play = 1 - self._to_play
30-
clone._winner = 1 - self._winner if self._winner is not None else None
30+
clone._outcome = self._outcome.reverse() if self._outcome is not None else None
3131
else:
3232
clone._board[:, :] = self._board
3333
clone._to_play = self._to_play
34-
clone._winner = self._winner
34+
clone._outcome = self._outcome
3535
clone._last_action, clone._winning_stones = self._last_action, self._winning_stones
3636
return clone
3737

@@ -48,16 +48,14 @@ def to_play(self) -> int:
4848
return self._to_play
4949

5050
def outcome(self, player: int) -> BoardGame.Outcome | None:
51-
if self._winner is None:
52-
return None
53-
return self.Outcome.WIN if self._winner == player else self.Outcome.LOSS
51+
return self._outcome if self._outcome is None or player == self._to_play else self._outcome.reverse()
5452

5553
def valid(self, action: int) -> bool:
56-
return self._winner is None and action >= 0 and action < self.ACTIONS \
54+
return self._outcome is None and action >= 0 and action < self.ACTIONS \
5755
and self._board[self._ACTION_Y[action], self._ACTION_X[action]] < 2
5856

5957
def valid_actions(self) -> list[int]:
60-
return np.nonzero(self._board[self._ACTION_Y, self._ACTION_X] < 2)[0] if self._winner is None else []
58+
return np.nonzero(self._board[self._ACTION_Y, self._ACTION_X] < 2)[0] if self._outcome is None else []
6159

6260
def move(self, action: int):
6361
self._last_action = action
@@ -99,7 +97,7 @@ def _move(self, action, random_value):
9997
if field >= 2:
10098
self._traverse(j, 0, field, edges, visited)
10199
if edges.all():
102-
self._winner = field - 2
100+
self._outcome = self.Outcome.WIN if field - 2 == self._to_play else self.Outcome.LOSS
103101
self._winning_stones = visited == 1
104102
visited += visited > 0
105103

lectures/lecture12.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
#### Reading: https://ufal.mff.cuni.cz/~straka/courses/npfl139/2425/slides.pdf/npfl139-2425-12.pdf,PDF Slides
55
#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl139/2425/npfl139-2425-12.mp4, Lecture
66
#### Questions: #lecture_12_questions
7+
#### Lecture assignment: az_quiz_randomized
78

89
- MuZero [[Julian Schrittwieser et al.: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model](https://arxiv.org/abs/1911.08265)]
910
- AlphaZero as regularized policy optization [[Jean-Bastien Grill et al.: Monte-Carlo Tree Search as Regularized Policy Optimization](https://arxiv.org/abs/2007.12509)]

tasks/az_quiz_randomized.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
### Assignment: az_quiz_randomized
2+
#### Date: Deadline: Jun 30, 22:00
3+
#### Points: 5 points; either this or `pisqorky` is required for automatically passing the exam
4+
5+
Extend the `az_quiz` assignment to handle the possibility of wrong
6+
answers. Therefore, when choosing a field (an action), you might not
7+
claim it; in such a case, the state of the field becomes “failed”. When
8+
a “failed” field is chosen as an action by a player, then either
9+
- it is successfully claimed by the player (they “answer correctly”); or
10+
- if the player “answers incorrectly”, the field is claimed by the opposing
11+
player; however, in this case, the original player continue playing
12+
(i.e., the players do not alternate in this case).
13+
14+
To instantiate this randomized game variant, either pass `randomized=True`
15+
to the `npfl139.board_games.AZQuiz`, or use `az_quiz_randomized` as a board
16+
games (e.g., as the argument to `npfl139.board_games.evaluate` or to
17+
`npfl139.board_games.BoardGame.from_name`).
18+
19+
Your goal is to propose how to modify the Monte Carlo Tree Search to properly
20+
handle stochastic MDPs. The information about distribution of possible next
21+
states is provided by the `AZQuiz.all_moves` method, which returns a list of
22+
`(probability, az_quiz_instance)` next states (in our environment, there are
23+
always two possible next states).
24+
25+
Your implementation must be capable of training and achieve at least 90% win
26+
rate against the simple heuristic. Additionally, part of this assignment is
27+
to also write us on Piazza (once you pass in ReCodEx) a description of how
28+
you handle the stochasticity in MCTS; you will get points only after we finish
29+
the discussion.

0 commit comments

Comments
 (0)