This library converts chess pgn files into CSV tabulated data sets.
A pgn file can contain one or multiple chess games. The library parses the pgn file and creates two csv files:
-
Games file: contains high level information (e.g. date, site, event, score, players etc...)
-
Moves file: contains the moves for each game (e.g. notation, squares, fen position, is in check etc...)
The two files can be mapped together using a GUID which the process inserts into both files.
The library requires Python 3.7 or later.
To install, type the following command on the python terminal:
pip install pgn2data
Here is a basic example of how to convert a PGN file:
from converter.pgn_data import PGNData
pgn_data = PGNData("tal_bronstein_1982.pgn")
pgn_data.export()
The following is an example of grouping multiple files into the same output file ("output.csv").
pgn_data = PGNData(["file1.pgn","file2.pgn"],"output")
pgn_data.export()
The export function has a return object which allows you to quickly check the size and location of the files created:
pgn_data = PGNData("tal_bronstein_1982.pgn")
result = pgn_data.export()
result.print_summary()
If you want to check if the files have been created before doing further processing you can do the following:
pgn_data = PGNData("tal_bronstein_1982.pgn")
result = pgn_data.export()
if result.is_complete:
print("Files created!")
else:
print("Files not created!")
The result object also provides methods to import the created files into pandas dataframes:
pgn_data = PGNData("tal_bronstein_1982.pgn")
result = pgn_data.export()
if result.is_complete:
# read the games file
games_df = result.get_games_df()
print(games_df.head())
# read the moves file
moves_df = result.get_moves_df()
print(moves_df.head())
# read both files joined together
combined_df = result.get_combined_df()
print(combined_df.head())
To output the game information only, you can do the following:
from converter.pgn_data import PGNData
pgn_data = PGNData("tal_bronstein_1982.pgn")
pgn_data.export(moves_required=False)
The folder 'samples' in this repository, has some examples of the output from the library.
You can also go here to see a Kaggle project that converted all of Magnus Carlsen's online Bullet games into CSV format.
This is a full list of the columns in each output file:
Field | Description |
---|---|
game_id | ID of game generated by process |
game_order | Order of game in PGN file |
event | Event |
site | Site |
date_played | Date played |
round | Round |
white | White player |
black | Black player |
result | Result |
white_elo | White player rating |
white_rating_diff | White rating difference from Black |
black_elo | Black player rating |
black_rating_diff | Black rating difference from White |
white_title | Player title |
black_title | Player title |
winner | Player name |
winner_elo | Player rating |
loser | Losing player |
loser_elo | Player rating |
winner_loser_elo_diff | Diff in rating |
eco | Opening |
termination | How game ended |
time_control | Time control |
utc_date | Date played |
utc_time | Time played |
variant | Game type |
ply_count | Ply Count |
date_created | Extract date |
file_name | PGN source file |
Field | Description |
---|---|
game_id | ID of game that maps to games file |
move_no | Order of moves |
move_no_pair | Chess move number |
player | Player name |
notation | Standard notation of move |
move | Before and after piece location |
from_square | Piece location before |
to_square | Piece location after |
piece | Initial of piece name |
color | Piece color |
fen | Fen position |
is_check | Is check on board |
is_check_mate | Is checkmate on board |
is_fifty_moves | Is 50 move complete |
is_fivefold_repetition | Is 5 fold repetition on board |
is_game_over | Is game over |
is_insufficient_material | Is game over from lack of mating material |
white_count | Count of white pieces |
black_count | Count of black pieces |
white_{piece}_count | Count of white specified piece |
black_{piece}_count | Count of black specified piece |
captured_score_for_white | Total of black pieces captured |
captured_score_for_black | Total of white pieces captured |
fen_row{number}_{colour)_count | Number of pieces for the specified colour on this row of the board |
fen_row{number}_{colour}_value | Total value of pieces for the specified colour on this row of the board |
move_sequence | Sequence of moves up to current position |
Contributions are welcome, all modifications should come with appropriate tests demonstrating an issue has been resolved, or new functionality is working as intended. Pull Requests without tests will not be merged.
The library can be tested by doing the following:
from testing.tests import run_all_tests
run_all_tests()
New tests should be added to the above method.
This project makes use of the python-chess library.