Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No output when training script is ran #62

Open
mkillah opened this issue Feb 22, 2024 · 9 comments
Open

No output when training script is ran #62

mkillah opened this issue Feb 22, 2024 · 9 comments

Comments

@mkillah
Copy link

mkillah commented Feb 22, 2024

Hi, thank you for sharing this amazing project with the rest of the world. I have the honour to use maia for my experimentation, but I have some problems and you may be able to help me with.

Problem:

I have managed to succeed in creating my own network from point 1 to point 4. I have data separated into training and validation data - pgns and I double-checked paths to data and config.

But I am stuck at point 4 (Run the training script move_prediction/train_maia.py PATH_TO_CONFIG).

When I run the script I get the message that nothing happens afterwards and no data is generated:

2024-02-22 21:38:59.828812: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
WARNING:tensorflow:From C:\Users\marko\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

2024-02-22 16:39:17 dataset:
input_test: virtualPlayerNet/validation//
input_train: virtualPlayerNet/training//
gpu: 0
model:
filters: 64
residual_blocks: 6
se_ratio: 8
training:
batch_size: 1024
checkpoint_steps: 10000
lr_boundaries:

  • 80000
  • 200000
  • 360000
    lr_values:
  • 0.1
  • 0.01
  • 0.001
  • 0.0001
    num_batch_splits: 1
    policy_loss_weight: 1.0
    precision: half
    shuffle_size: 250000
    test_steps: 2000
    total_steps: 400000
    train_avg_report_steps: 50
    value_loss_weight: 1.0

2024-02-22 16:39:17 found [] chunk dirs
2024-02-22 16:39:17 found 0 chunks total
Not enough chunks 0

I think there is no trainingdata-tool.exe and the script can not convert pgns to .gz files. Does anyone have trainingdata-tool.exe?

It would be delightful if someone could give me a direction :)

Cheers

@reidmcy
Copy link
Member

reidmcy commented Feb 26, 2024

You don't have any chunk files in virtualPlayerNet/training// the double slash makes it look like there's an issue with the path. Are you sure pgn_to_trainingdata.sh ran correctly? It should take some time to run and make a bunch of new files.

@mkillah
Copy link
Author

mkillah commented Feb 26, 2024

For some reason the issue section did not render /astrix/astrix, but //.

Anyway, I think that pgn_to_trainingdata.sh did not run correctly due to the absence of trainingdata-tool.exe. Consequently, the script did not transform pgns to .gz files, and that result is "Not enough chunks 0".

when I run move_prediction/pgn_to_trainingdata.sh PGN_FILE_PATH OUTPUT_PATH I get:

10000 games matched out of 10000.
removed directory '/c/Users/marko/PycharmProjects/maia-chess-virtualchessplayer/data/blocks'
Starting on training 10
move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected
Starting on training 4
move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected
Starting on training 5
move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected
Starting on training 6
move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected
Starting on training 7
move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected
Starting on training 8
move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected
Starting on training 9
move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected
Starting on validation 1
move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected
Starting on validation 2
move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected
Starting on validation 3
move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found
move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected
Almost done
Done

In the repository I can not find trainingdata-tool.exe, only trainingdata-tool.cpp.

Does anyone have trainingdata-tool.exe?

@reidmcy
Copy link
Member

reidmcy commented Feb 26, 2024

You need to compile it yourself and add it to your path

@mkillah
Copy link
Author

mkillah commented Feb 27, 2024

I have a real trouble compiling it (total noob in C++). May I ask you to cut me some slack and provide me with the file?

@reidmcy
Copy link
Member

reidmcy commented Feb 28, 2024

Did you run the cmake commands listed in the README? That worked for me on a fresh Ubuntu server.

C++ doesn't make portable binaries by default, you need to compile it yourself.

Training the model will require even more technical knowledge. This code release is not tested on other systems and is meant to be documentation of what we did, not a general method of doing training, so will almost certainly require changes to run on your system.

@CallOn84
Copy link

CallOn84 commented Mar 2, 2024

For some reason the issue section did not render /astrix/astrix, but //.

Anyway, I think that pgn_to_trainingdata.sh did not run correctly due to the absence of trainingdata-tool.exe. Consequently, the script did not transform pgns to .gz files, and that result is "Not enough chunks 0".

when I run move_prediction/pgn_to_trainingdata.sh PGN_FILE_PATH OUTPUT_PATH I get:

10000 games matched out of 10000. removed directory '/c/Users/marko/PycharmProjects/maia-chess-virtualchessplayer/data/blocks' Starting on training 10 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 4 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 5 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 6 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 7 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 8 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on training 9 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on validation 1 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on validation 2 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Starting on validation 3 move_prediction/pgn_to_trainingdata.sh: line 43: trainingdata-tool: command not found move_prediction/pgn_to_trainingdata.sh: line 44: pgrep: command not found move_prediction/pgn_to_trainingdata.sh: line 44: [: -gt: unary operator expected Almost done Done

In the repository I can not find trainingdata-tool.exe, only trainingdata-tool.cpp.

Does anyone have trainingdata-tool.exe?

Thankfully, I have a copy of trainingdata-tool before the links became depreciated. You can find it on my repository for my Leela & Leela-drived nets.

@CallOn84
Copy link

CallOn84 commented Mar 2, 2024

Are you running on a Windows or Linux machine @mkillah?

@mkillah
Copy link
Author

mkillah commented Mar 2, 2024

@reidmcy thank you for the explanation! I only have to wrap my mind around using C++, 'cause I do not have any experience with it.

Anyway, @CallOn84 thank you for providing me with the information and files!! I will give it a try now. I am running everything on a Windows machine.

@CallOn84
Copy link

CallOn84 commented Mar 2, 2024

@reidmcy thank you for the explanation! I only have to wrap my mind around using C++, 'cause I do not have any experience with it.

Anyway, @CallOn84 thank you for providing me with the information and files!! I will give it a try now. I am running everything on a Windows machine.

The good news for you is that I'm currently doing a training run of a Maia 2200 net on my PC, also a Windows machine. The issue you're facing is how Windows handles CMD directories. Instead of /path/to/file/*/, you need it to be \path\to\file\*\. This should fix the no-chunk issue that you're getting.

You must also change two things before you start training a Maia net. Reid and his team used a Tensorflow version with an experimental Keras code that was later replaced with a more stable and improved version of Keras from Tensorflow version 2.4+, about which I made a pull request. The good thing is that Reid and his team are revising the training model that should solve this issue and using more up-to-date libraries.

Until then, if you want to train a Maia net now, you'll need to change two lines of code in the tfprocess.py Python file that you can find in the pull request I made, which you can find the link here: https://github.com/CSSLab/maia-chess/pull/57

If you have any questions or issues, let me know here or through Discord by sending a message to callon84.

Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants