legal-moves in play.py might be better if sucide check added #12

guohl · 2018-01-06T05:48:53Z

No description provided.

Narsil · 2018-01-06T08:45:39Z

It would definitely help for sgf inspection, as most sgf readers I checked pop warning for sucides. However, checking for sucide placement seems a bit expensive (I probably need to copy the board, add the stone, and check for self-capture, for every otherwise allowable move.) I would also like to check that the algorithm can learn on its own to avoid these bad moves.

guohl · 2018-01-06T09:10:58Z

if add self fill eyes check to mask in play.py(line:12) it might be much better

Narsil · 2018-01-06T09:41:53Z

Yes, you can add it after the KO situation.

TBH, I think a simple west, north, east, south check might be enough (and not do the the more general add stone, check capture).

In any case, we need to check was it the performance impact before adding it into master.

guohl · 2018-01-06T10:15:43Z

I noticed that if no self eyes check, it takes more than 700 moves to finish a game and even 722 moves, especially, the win/lose judgement is incorrect and it will affect the learn process.
Is it possible to just modify mask by change 0 to nonzero in mask to avoid self fill eyes? thank you!

Narsil · 2018-01-06T17:10:51Z

Playing into self eyes is a legal move so I won't add it into the legal_moves as the paper mentions it simply implemented the Tromp-Taylor rules (which do allow sucides as well). But feel free to do it if you want.

You simply need a simple check North, west, east and south. I can check your code if you want.

guohl · 2018-01-07T00:04:53Z

I have tried the following, but I am not sure if it is right. need your help , thank you!
just follow the KO section:

real_board = get_real_board(board)
color = board[0,0,0,-1]
y=0
while (y<SIZE):
    index0=y*SIZE
    x=0
    while (x<SIZE):
        index=index0+x
        if real_board[x,y]==0:
            l = x-1
            r = x+1
            d = y-1
            u = y+1

            if l<0: l=r
            if r>=SIZE: r=l
            if d<0: d=u
            if u>=SIZE: u=d

            if ((real_board[l,y]==color)+(real_board[r,y]==color)+(real_board[x,u]==color)+(real_board[x,d]==color)+
                (real_board[l,u]==color)+(real_board[r,u]==color)+(real_board[l,d]==color)+(real_board[r,d]==color))>6: 
                mask[index]=True
        x+=1
    y+=1

Narsil · 2018-01-07T11:27:08Z

Ok. It has a few bugs, but it is the general idea.

You need only to check 4 places, not 8.
You should count the liberties, which would make your code less error prone.
if l<0, then it's not a liberty.
if real_board[l, y] == color, then it's not a liberty.

Advices:

Don't use while unless necessary, it's error prone, and very painful to debug. In python, use for almost all the time
Try to avoid the list of + (or a list of and), insert it in a loop, it's much more readable. Like so:
- for x0, y0 in [ [x, u], [x, d], [l, y], [r, y] ]:
  ......check(x0, y0)
Try to put that in a function, it clearly can be wrapped, it would help readability and testability.

guohl · 2018-01-07T12:34:28Z

Many thanks! I'll try.

guohl · 2018-01-09T10:59:00Z

another change, and it seems working fine for now:

every game of self_play seems more reasonable
if needed, the self fill check can be canceled
it might be more efficient at the beginning of model training
ps. engine.py: 88, float32 can be changed to int32? thank you.

def legal_moves(board):
#Occupied places
mask1 = board[0,:,:,0] != 0
mask2 = board[0,:,:,1] != 0

#Self fill 
if True:
#if board[0,:,:,0].sum()+>158:

    #Four corners: 
    if board[0,0,0,0]==0 and board[0,0:2,0:2,0].sum()==3:
        mask1[0,0]=True

    if board[0,SIZE-1,0,0]==0 and board[0,SIZE-2:SIZE,0:2,0].sum()==3:
        mask1[SIZE-1,0]=True

    if board[0,SIZE-1,SIZE-1,0]==0 and board[0,SIZE-2:SIZE,SIZE-2:SIZE,0].sum()==3:
        mask1[SIZE-1,SIZE-1]=True

    if board[0,0,SIZE-1,0]==0 and board[0,0:2,SIZE-2:SIZE,0].sum()==3:
        mask1[0,SIZE-1]=True

    #Four sides
    for x in range(1,SIZE-1):
        if board[0,x,0,0]==0 and board[0,x-1:x+2,0:2,0].sum()==5:
            mask1[x,0]=True

        if board[0,SIZE-1,x,0]==0 and board[0,SIZE-2:SIZE,x-1:x+2,0].sum()==5:
            mask1[SIZE-1,x]=True

        if board[0,x,SIZE-1,0]==0 and board[0,x-1:x+2,SIZE-2:SIZE,0].sum()==5:
            mask1[x,SIZE-1]=True

        if board[0,0,x,0]==0 and board[0,0:2,x-1:x+2,0].sum()==5:
            mask1[0,x]=True

    #Inner 
    for x in range(1,SIZE-1):
        for y in range(1,SIZE-1):
            if board[0,x,y,0]==0 and board[0,x-1:x+2,y-1:y+2,0].sum()>=7:
                mask1[x,y]=True

mask = (mask1 + mask2).reshape(-1)
# Ko situations
ko_mask = ((board[0,:,:,2] - board[0,:,:,0]))
if (ko_mask == 1).sum() == 1:
    mask += (ko_mask == 1).reshape(-1)

# Pass is always legal  ????????
mask = np.append(mask, False)
return mask

Narsil · 2018-01-12T04:58:40Z

I gave a little more thought about this code. It seems it would work for self fill. Although a few tests would'nt hurt to make sure.

Your implementation works for self fill, but it won't work for sucide, because in order to assess if a sucide is playable or not, you NEED to check if the sucide does not capture first.

For instance, consider:

. X O O O X . .
. X O P O X . .
. X O O O X . .
. X X X X X . . .

Then position P is allowed even though your code forbids it.
So to add self sucide checks we do need the whole copy, add stone check capture mechanics.

I would have been happy to add self sucide flag. For self fill, I'll start an experiment to see how fast it improves learning, and I'll get back to you, it really feels a little cheat (the algorithm should somehow learn that on its own).

By the way, what settings are you using and on what kind of hardware ?
I'm using a GTX970, with 9x9 board (1k epochs, 250 games of self_play, and 640 mcts simulations)

guohl · 2018-01-12T13:29:25Z

For efficiency consideration, the suicide check might not be added to mask. It will take much more time than the simple mask check because it must be assessed if a sucide is a playable move, while, it might improve the performance in general. We can put a "IF ..." before the Self Fill check section to improve check speed, for example, "IF moves>100", when total moves more than 100 or 200,etc, but we need a parameter "moves" added to legal_moves.
Yesterday, when I check the SGF with sabaki, I found the win/loss function may have problem when the game not get to the final closure(almost only one liberties left), so I set the True for PASS to MASK, it seems works better.
I am not sure if the function: color_board should be changed to as follow:
def color_board(real_board, color):
board = np.copy(real_board)
for i, row in enumerate(board):
for j, v in enumerate(row):
if v == color:
#_color_adjoint(i, j, color, board) #?????????????????????????????
board = _color_adjoint(i, j, color, board)
return board

Also, I met a running time problem:
Traceback (most recent call last):
File "main.py", line 34, in
main()
File "main.py", line 30, in main
evaluate(best_model, model)
File "/home/Apps/alphagozero/evaluator.py", line 46, in evaluate
elect_model_as_best_model(tested_model)
File "/home/Apps/alphagozero/evaluator.py", line 13, in elect_model_as_best_model
self_play(model, n_games=conf['N_GAMES'], mcts_simulations=conf['MCTS_SIMULATIONS'])
File "/home/Apps/alphagozero/self_play.py", line 155, in self_play
game_data = play_game(model, model, mcts_simulations, conf['STOP_EXPLORATION'], self_play=True, num_moves=None, resign_model1=resign, resign_model2=resign)
File "/home/Apps/alphagozero/self_play.py", line 60, in play_game
x, y, policy_target, value, _, _ = engine1.genmove("B")
File "/home/Apps/alphagozero/engine.py", line 252, in genmove
index = select_play(policy, self.board, self.mcts_simulations, self.tree.tree, self.temperature, self.model)
File "/home/Apps/alphagozero/engine.py", line 177, in select_play
index = mcts_decision(policy, board, mcts_simulations, mcts_tree, temperature, model)
File "/home/Apps/alphagozero/engine.py", line 156, in mcts_decision
simulate(mcts_tree, test_board, model, MCTS_BATCH_SIZE, original_player)
File "/home/Apps/alphagozero/engine.py", line 82, in simulate
max_a = max_actions[0]['action']
IndexError: list index out of range

My setting: GTX1080, 19*19 100 epochs, 5 games of self_play, 64 mcts simulations, just for testing!

Narsil · 2018-01-12T17:30:43Z

1/ If you see a bug, can you provide a test for it?

2/ You can add the assignment into the color_board function but it does not change anything as the board is mutable, and _color_adjoint already modifies the board.

3/ This means you don't have any actions available while simulating. This is rather odd, because passing should always be an action at the very least. Did you change the MCTS_BATCH_SIZE ? MCTS_SIMULATIONS should be at least larger than the batch SIZE (so it can do at least one batch).

Can you open new separate issues for these ?

guohl · 2018-01-15T01:42:08Z

I have made a few "fixes" as following, and the result seems more reasonable! and get_winner function works fine, though we may not need it anymore! ^=^ (still working on it, but takes time ^=^).

new file for external vars: scoreboard.py
total_moves=0 #total moves of one game
suicide_flag=False #suicide flag set to True when black or white commit
eyes=[0,0,0] #true eyes of black(eyes[0]) and white(eyes[2])
playable=[361,0,361] #intersections available for black(playable[0]) and white(playable[2])
engine.py: line: 88 needed from float32 to int32
boards = np.zeros((mcts_batch_size, SIZE, SIZE, 17), dtype=np.int32)
self_play.py:
(a) after line 90, add the next lines: (add "import scoreboard" at the beginning)
if scoreboard.playable[0]==scoreboard.eyes[2] and scoreboard.playable[2]==scoreboard.eyes[0]:
end_reason = "GAME_OVER"
break
(b) following line 60, add: scoreboard.total_moves=move_n

play.py: (add "import scoreboard" at the beginning), and the legal_moves function:
def legal_moves(board):

#Occupied places
mask1 = board[0,:,:,0] != 0
mask2 = board[0,:,:,1] != 0
idx = board[0,0,0,-1]+1
scoreboard.eyes[idx]=0

#Self fill
#if scoreboard.moves>200: ??
if scoreboard.suicide_flag:
#Four corners:
if board[0,0,0,0]==0 and board[0,0:2,0:2,0].sum()==3:
mask1[0,0]=True
scoreboard.eyes[idx]+=1

 if board[0,SIZE-1,0,0]==0 and board[0,SIZE-2:SIZE,0:2,0].sum()==3:
     mask1[SIZE-1,0]=True
     scoreboard.eyes[idx]+=1

 if board[0,SIZE-1,SIZE-1,0]==0 and board[0,SIZE-2:SIZE,SIZE-2:SIZE,0].sum()==3:
     mask1[SIZE-1,SIZE-1]=True
     scoreboard.eyes[idx]+=1

 if board[0,0,SIZE-1,0]==0 and board[0,0:2,SIZE-2:SIZE,0].sum()==3:
     mask1[0,SIZE-1]=True
     scoreboard.eyes[idx]+=1

 #Four sides
 for x in range(1,SIZE-1):
     if board[0,x,0,0]==0 and board[0,x-1:x+2,0:2,0].sum()==5:
         mask1[x,0]=True
         scoreboard.eyes[idx]+=1

     if board[0,SIZE-1,x,0]==0 and board[0,SIZE-2:SIZE,x-1:x+2,0].sum()==5:
         mask1[SIZE-1,x]=True
         scoreboard.eyes[idx]+=1

     if board[0,x,SIZE-1,0]==0 and board[0,x-1:x+2,SIZE-2:SIZE,0].sum()==5:
         mask1[x,SIZE-1]=True
         scoreboard.eyes[idx]+=1

     if board[0,0,x,0]==0 and board[0,0:2,x-1:x+2,0].sum()==5:
         mask1[0,x]=True
         scoreboard.eyes[idx]+=1

 #Inner 
 for x in range(1,SIZE-1):
     for y in range(1,SIZE-1):
         if board[0,x,y,0]!=0:
             continue
         surround=board[0,x-1:x+2,y-1:y+2,0].sum()
         if surround==8: 
             mask1[x,y]=True
             scoreboard.eyes[idx]+=1
             continue
         if surround==7 and board[0,x,y-1,0]==1 and board[0,x,y+1,0]==1 and board[0,x-1,y,0]==1 and board[0,x+1,y,0]==1:
             mask1[x,y]=True
             scoreboard.eyes[idx]+=1

mask = (mask1 + mask2).reshape(-1)
scoreboard.playable[idx] = SIZE*SIZE - mask.sum()

#Ko situations
ko_mask = ((board[0,:,:,2] - board[0,:,:,0]))
if (ko_mask == 1).sum() == 1:
mask += (ko_mask == 1).reshape(-1)

#Pass is always legal
if scoreboard.total_moves>SIZE*SIZE:
mask = np.append(mask, False)
else:
mask = np.append(mask, True)
return mask

This was referenced Jan 6, 2018

legal-moves in play.py might be better if sucide check added #11

Closed

self-play: 155 parameters not match with the called function #9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

legal-moves in play.py might be better if sucide check added #12

legal-moves in play.py might be better if sucide check added #12

guohl commented Jan 6, 2018

Narsil commented Jan 6, 2018

guohl commented Jan 6, 2018

Narsil commented Jan 6, 2018

guohl commented Jan 6, 2018

Narsil commented Jan 6, 2018

guohl commented Jan 7, 2018 •

edited

Loading

Narsil commented Jan 7, 2018 •

edited

Loading

guohl commented Jan 7, 2018

guohl commented Jan 9, 2018 •

edited

Loading

Narsil commented Jan 12, 2018

guohl commented Jan 12, 2018

Narsil commented Jan 12, 2018

guohl commented Jan 15, 2018 •

edited

Loading

legal-moves in play.py might be better if sucide check added #12

legal-moves in play.py might be better if sucide check added #12

Comments

guohl commented Jan 6, 2018

Narsil commented Jan 6, 2018

guohl commented Jan 6, 2018

Narsil commented Jan 6, 2018

guohl commented Jan 6, 2018

Narsil commented Jan 6, 2018

guohl commented Jan 7, 2018 • edited Loading

Narsil commented Jan 7, 2018 • edited Loading

guohl commented Jan 7, 2018

guohl commented Jan 9, 2018 • edited Loading

Narsil commented Jan 12, 2018

guohl commented Jan 12, 2018

Narsil commented Jan 12, 2018

guohl commented Jan 15, 2018 • edited Loading

guohl commented Jan 7, 2018 •

edited

Loading

Narsil commented Jan 7, 2018 •

edited

Loading

guohl commented Jan 9, 2018 •

edited

Loading

guohl commented Jan 15, 2018 •

edited

Loading