Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

legal-moves in play.py might be better if sucide check added #12

Open
guohl opened this issue Jan 6, 2018 · 13 comments
Open

legal-moves in play.py might be better if sucide check added #12

guohl opened this issue Jan 6, 2018 · 13 comments

Comments

@guohl
Copy link

guohl commented Jan 6, 2018

No description provided.

@Narsil
Copy link
Owner

Narsil commented Jan 6, 2018

It would definitely help for sgf inspection, as most sgf readers I checked pop warning for sucides. However, checking for sucide placement seems a bit expensive (I probably need to copy the board, add the stone, and check for self-capture, for every otherwise allowable move.) I would also like to check that the algorithm can learn on its own to avoid these bad moves.

@guohl
Copy link
Author

guohl commented Jan 6, 2018

if add self fill eyes check to mask in play.py(line:12) it might be much better

@Narsil
Copy link
Owner

Narsil commented Jan 6, 2018

Yes, you can add it after the KO situation.

TBH, I think a simple west, north, east, south check might be enough (and not do the the more general add stone, check capture).

In any case, we need to check was it the performance impact before adding it into master.

@guohl
Copy link
Author

guohl commented Jan 6, 2018

I noticed that if no self eyes check, it takes more than 700 moves to finish a game and even 722 moves, especially, the win/lose judgement is incorrect and it will affect the learn process.
Is it possible to just modify mask by change 0 to nonzero in mask to avoid self fill eyes? thank you!

@Narsil
Copy link
Owner

Narsil commented Jan 6, 2018

Playing into self eyes is a legal move so I won't add it into the legal_moves as the paper mentions it simply implemented the Tromp-Taylor rules (which do allow sucides as well). But feel free to do it if you want.

You simply need a simple check North, west, east and south. I can check your code if you want.

@guohl
Copy link
Author

guohl commented Jan 7, 2018

I have tried the following, but I am not sure if it is right. need your help , thank you!
just follow the KO section:

real_board = get_real_board(board)
color = board[0,0,0,-1]
y=0
while (y<SIZE):
    index0=y*SIZE
    x=0
    while (x<SIZE):
        index=index0+x
        if real_board[x,y]==0:
            l = x-1
            r = x+1
            d = y-1
            u = y+1

            if l<0: l=r
            if r>=SIZE: r=l
            if d<0: d=u
            if u>=SIZE: u=d

            if ((real_board[l,y]==color)+(real_board[r,y]==color)+(real_board[x,u]==color)+(real_board[x,d]==color)+
                (real_board[l,u]==color)+(real_board[r,u]==color)+(real_board[l,d]==color)+(real_board[r,d]==color))>6: 
                mask[index]=True
        x+=1
    y+=1

@Narsil
Copy link
Owner

Narsil commented Jan 7, 2018

Ok. It has a few bugs, but it is the general idea.

You need only to check 4 places, not 8.
You should count the liberties, which would make your code less error prone.
if l<0, then it's not a liberty.
if real_board[l, y] == color, then it's not a liberty.

Advices:

  • Don't use while unless necessary, it's error prone, and very painful to debug. In python, use for almost all the time
  • Try to avoid the list of + (or a list of and), insert it in a loop, it's much more readable. Like so:
    • for x0, y0 in [ [x, u], [x, d], [l, y], [r, y] ]:
      ......check(x0, y0)
  • Try to put that in a function, it clearly can be wrapped, it would help readability and testability.

@guohl
Copy link
Author

guohl commented Jan 7, 2018

Many thanks! I'll try.

@guohl
Copy link
Author

guohl commented Jan 9, 2018

another change, and it seems working fine for now:

  1. every game of self_play seems more reasonable
  2. if needed, the self fill check can be canceled
  3. it might be more efficient at the beginning of model training
    ps. engine.py: 88, float32 can be changed to int32? thank you.

def legal_moves(board):
#Occupied places
mask1 = board[0,:,:,0] != 0
mask2 = board[0,:,:,1] != 0

#Self fill 
if True:
#if board[0,:,:,0].sum()+>158:

    #Four corners: 
    if board[0,0,0,0]==0 and board[0,0:2,0:2,0].sum()==3:
        mask1[0,0]=True

    if board[0,SIZE-1,0,0]==0 and board[0,SIZE-2:SIZE,0:2,0].sum()==3:
        mask1[SIZE-1,0]=True

    if board[0,SIZE-1,SIZE-1,0]==0 and board[0,SIZE-2:SIZE,SIZE-2:SIZE,0].sum()==3:
        mask1[SIZE-1,SIZE-1]=True

    if board[0,0,SIZE-1,0]==0 and board[0,0:2,SIZE-2:SIZE,0].sum()==3:
        mask1[0,SIZE-1]=True

    #Four sides
    for x in range(1,SIZE-1):
        if board[0,x,0,0]==0 and board[0,x-1:x+2,0:2,0].sum()==5:
            mask1[x,0]=True

        if board[0,SIZE-1,x,0]==0 and board[0,SIZE-2:SIZE,x-1:x+2,0].sum()==5:
            mask1[SIZE-1,x]=True

        if board[0,x,SIZE-1,0]==0 and board[0,x-1:x+2,SIZE-2:SIZE,0].sum()==5:
            mask1[x,SIZE-1]=True

        if board[0,0,x,0]==0 and board[0,0:2,x-1:x+2,0].sum()==5:
            mask1[0,x]=True

    #Inner 
    for x in range(1,SIZE-1):
        for y in range(1,SIZE-1):
            if board[0,x,y,0]==0 and board[0,x-1:x+2,y-1:y+2,0].sum()>=7:
                mask1[x,y]=True

mask = (mask1 + mask2).reshape(-1)
# Ko situations
ko_mask = ((board[0,:,:,2] - board[0,:,:,0]))
if (ko_mask == 1).sum() == 1:
    mask += (ko_mask == 1).reshape(-1)

# Pass is always legal  ????????
mask = np.append(mask, False)
return mask

@Narsil
Copy link
Owner

Narsil commented Jan 12, 2018

I gave a little more thought about this code. It seems it would work for self fill. Although a few tests would'nt hurt to make sure.

Your implementation works for self fill, but it won't work for sucide, because in order to assess if a sucide is playable or not, you NEED to check if the sucide does not capture first.

For instance, consider:

. X O O O X . .
. X O P O X . .
. X O O O X . .
. X X X X X . . .

Then position P is allowed even though your code forbids it.
So to add self sucide checks we do need the whole copy, add stone check capture mechanics.

I would have been happy to add self sucide flag. For self fill, I'll start an experiment to see how fast it improves learning, and I'll get back to you, it really feels a little cheat (the algorithm should somehow learn that on its own).

By the way, what settings are you using and on what kind of hardware ?
I'm using a GTX970, with 9x9 board (1k epochs, 250 games of self_play, and 640 mcts simulations)

@guohl
Copy link
Author

guohl commented Jan 12, 2018

For efficiency consideration, the suicide check might not be added to mask. It will take much more time than the simple mask check because it must be assessed if a sucide is a playable move, while, it might improve the performance in general. We can put a "IF ..." before the Self Fill check section to improve check speed, for example, "IF moves>100", when total moves more than 100 or 200,etc, but we need a parameter "moves" added to legal_moves.
Yesterday, when I check the SGF with sabaki, I found the win/loss function may have problem when the game not get to the final closure(almost only one liberties left), so I set the True for PASS to MASK, it seems works better.
I am not sure if the function: color_board should be changed to as follow:
def color_board(real_board, color):
board = np.copy(real_board)
for i, row in enumerate(board):
for j, v in enumerate(row):
if v == color:
#_color_adjoint(i, j, color, board) #?????????????????????????????
board = _color_adjoint(i, j, color, board)
return board

Also, I met a running time problem:
Traceback (most recent call last):
File "main.py", line 34, in
main()
File "main.py", line 30, in main
evaluate(best_model, model)
File "/home/Apps/alphagozero/evaluator.py", line 46, in evaluate
elect_model_as_best_model(tested_model)
File "/home/Apps/alphagozero/evaluator.py", line 13, in elect_model_as_best_model
self_play(model, n_games=conf['N_GAMES'], mcts_simulations=conf['MCTS_SIMULATIONS'])
File "/home/Apps/alphagozero/self_play.py", line 155, in self_play
game_data = play_game(model, model, mcts_simulations, conf['STOP_EXPLORATION'], self_play=True, num_moves=None, resign_model1=resign, resign_model2=resign)
File "/home/Apps/alphagozero/self_play.py", line 60, in play_game
x, y, policy_target, value, _, _ = engine1.genmove("B")
File "/home/Apps/alphagozero/engine.py", line 252, in genmove
index = select_play(policy, self.board, self.mcts_simulations, self.tree.tree, self.temperature, self.model)
File "/home/Apps/alphagozero/engine.py", line 177, in select_play
index = mcts_decision(policy, board, mcts_simulations, mcts_tree, temperature, model)
File "/home/Apps/alphagozero/engine.py", line 156, in mcts_decision
simulate(mcts_tree, test_board, model, MCTS_BATCH_SIZE, original_player)
File "/home/Apps/alphagozero/engine.py", line 82, in simulate
max_a = max_actions[0]['action']
IndexError: list index out of range

My setting: GTX1080, 19*19 100 epochs, 5 games of self_play, 64 mcts simulations, just for testing!

@Narsil
Copy link
Owner

Narsil commented Jan 12, 2018

1/ If you see a bug, can you provide a test for it?

2/ You can add the assignment into the color_board function but it does not change anything as the board is mutable, and _color_adjoint already modifies the board.

3/ This means you don't have any actions available while simulating. This is rather odd, because passing should always be an action at the very least. Did you change the MCTS_BATCH_SIZE ? MCTS_SIMULATIONS should be at least larger than the batch SIZE (so it can do at least one batch).

Can you open new separate issues for these ?

@guohl
Copy link
Author

guohl commented Jan 15, 2018

I have made a few "fixes" as following, and the result seems more reasonable! and get_winner function works fine, though we may not need it anymore! ^=^ (still working on it, but takes time ^=^).

  1. new file for external vars: scoreboard.py
    total_moves=0 #total moves of one game
    suicide_flag=False #suicide flag set to True when black or white commit
    eyes=[0,0,0] #true eyes of black(eyes[0]) and white(eyes[2])
    playable=[361,0,361] #intersections available for black(playable[0]) and white(playable[2])

  2. engine.py: line: 88 needed from float32 to int32
    boards = np.zeros((mcts_batch_size, SIZE, SIZE, 17), dtype=np.int32)

  3. self_play.py:
    (a) after line 90, add the next lines: (add "import scoreboard" at the beginning)
    if scoreboard.playable[0]==scoreboard.eyes[2] and scoreboard.playable[2]==scoreboard.eyes[0]:
    end_reason = "GAME_OVER"
    break
    (b) following line 60, add: scoreboard.total_moves=move_n

  4. play.py: (add "import scoreboard" at the beginning), and the legal_moves function:
    def legal_moves(board):

    #Occupied places
    mask1 = board[0,:,:,0] != 0
    mask2 = board[0,:,:,1] != 0
    idx = board[0,0,0,-1]+1
    scoreboard.eyes[idx]=0

    #Self fill
    #if scoreboard.moves>200: ??
    if scoreboard.suicide_flag:
    #Four corners:
    if board[0,0,0,0]==0 and board[0,0:2,0:2,0].sum()==3:
    mask1[0,0]=True
    scoreboard.eyes[idx]+=1

     if board[0,SIZE-1,0,0]==0 and board[0,SIZE-2:SIZE,0:2,0].sum()==3:
         mask1[SIZE-1,0]=True
         scoreboard.eyes[idx]+=1
    
     if board[0,SIZE-1,SIZE-1,0]==0 and board[0,SIZE-2:SIZE,SIZE-2:SIZE,0].sum()==3:
         mask1[SIZE-1,SIZE-1]=True
         scoreboard.eyes[idx]+=1
    
     if board[0,0,SIZE-1,0]==0 and board[0,0:2,SIZE-2:SIZE,0].sum()==3:
         mask1[0,SIZE-1]=True
         scoreboard.eyes[idx]+=1
    
     #Four sides
     for x in range(1,SIZE-1):
         if board[0,x,0,0]==0 and board[0,x-1:x+2,0:2,0].sum()==5:
             mask1[x,0]=True
             scoreboard.eyes[idx]+=1
    
         if board[0,SIZE-1,x,0]==0 and board[0,SIZE-2:SIZE,x-1:x+2,0].sum()==5:
             mask1[SIZE-1,x]=True
             scoreboard.eyes[idx]+=1
    
         if board[0,x,SIZE-1,0]==0 and board[0,x-1:x+2,SIZE-2:SIZE,0].sum()==5:
             mask1[x,SIZE-1]=True
             scoreboard.eyes[idx]+=1
    
         if board[0,0,x,0]==0 and board[0,0:2,x-1:x+2,0].sum()==5:
             mask1[0,x]=True
             scoreboard.eyes[idx]+=1
    
     #Inner 
     for x in range(1,SIZE-1):
         for y in range(1,SIZE-1):
             if board[0,x,y,0]!=0:
                 continue
             surround=board[0,x-1:x+2,y-1:y+2,0].sum()
             if surround==8: 
                 mask1[x,y]=True
                 scoreboard.eyes[idx]+=1
                 continue
             if surround==7 and board[0,x,y-1,0]==1 and board[0,x,y+1,0]==1 and board[0,x-1,y,0]==1 and board[0,x+1,y,0]==1:
                 mask1[x,y]=True
                 scoreboard.eyes[idx]+=1
    

    mask = (mask1 + mask2).reshape(-1)
    scoreboard.playable[idx] = SIZE*SIZE - mask.sum()

    #Ko situations
    ko_mask = ((board[0,:,:,2] - board[0,:,:,0]))
    if (ko_mask == 1).sum() == 1:
    mask += (ko_mask == 1).reshape(-1)

    #Pass is always legal
    if scoreboard.total_moves>SIZE*SIZE:
    mask = np.append(mask, False)
    else:
    mask = np.append(mask, True)
    return mask

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants