changelog.txt

v0.18.1 (2018-10-02)
~~~~~~~

* Fix for falling into threefold repetition in a winning endgame tablebase position.


v0.18.0 (2018-09-30)
~~~~~~~

* No changes from rc2 except the version.


v0.18.0-rc2 (2018-09-26)
~~~~~~~~~~~

* Severe bug fixed: Race condition when out-of-order-eval was enabled (and it
  was enabled by default)

* Windows 32-bit builds are now possible (CPU only for now)


v0.18.0-rc1 (2018-09-24)
~~~~~~~~~~~

KNOWN BUG!

* We have credible reports that in some rare cases Lc0 crashes!
  However, we were not able to reproduce it reliably. If you see the crash,
  please report to devs! What seems to increase crash probability:
  - Very short move time (milliseconds)
  - Proximity to a checkmate (happens 1-3 moves before the checkmate)


New features:

* Endgame tablebases support! Both WDL and DTZ now.

* Added MultiPv support.


Time management changes:

* Introduced --immediate-time-use flag. Yes, yet another time management
  flag. Posible values are between 0.0 and 1.0. Setting it closer to 
  1.0 makes Leela use time saved from futile search aversion earlier.

* Some time management parameters were changed:
  - Slowmover is 1.0 now (was 2.4)
  - Immediate-time-use is 0.6 now (didn't exist before, so was 0.0)

* Fixed a bug, because of which futile search aversion tolerance was incorrectly
  applied, which resulted in instamoves.

* Now search stops immediately when it runs out of budgeted time.
  Should help against timeouts, especially on slow backends (e.g. BLAS).

* Move overhead now is a fixed time, doesn't depend on number of remaining
  moves.


Other:

* Out of order eval is on by default. That brings slight nps improvement.

* Default FPU reduction is 1.2 now (was 0.9)

* Cudnn backend now has max_batch parameter.
  (can be set for example like this --backend-opts=max_batch=100).
  This is needed for lower end GPUs that didn't have enough VRAM for a buffer
  of size 1024. Make sure that this setting is not lower than --minibatch-size.

* Small memory usage optimizations.

* Engine name in UCI response is shorter now. Fritz chess UI should be able
  to work with Leela now

* Added flag --temp-visit-offset, will allow to offset temperature during
  training.

* Command line and UCI parameter values are now checked for validity.

* You can now build for older processors that don't support the popcnt
  instruction by passing -Dpopcnt=false to meson when building.

* 32-bit build is possible now. CPU only and we were only able to build it 
  in Linux for now, including Raspberry Pi.

* Threading issue which caused crash in heavily multithreaded environment
  with slow backends was fixed.


v0.17.0 (2018-08-27)
~~~~~~~

No changes from rc2 except the version.


v0.17.0-rc2 (2018-08-21)
~~~~~~~~~~~

* Fixed a bug, that rule50 value was located in wrong place in a training data.
* OpenCL uses much less VRAM now.
* Default OpenCL batch size is 16 now (was 1).
* Default time management related configuration was tweaked:
  --futile-move-aversion is 1.33 now (was 1.47)
  --slowmover is 2.4 now (was 2.6)


v0.17.0-rc1 (2018-08-19)
~~~~~~~~~~~

New visible features:
* Implemented ponder support.
* Tablebases are supported now (only WDL probe for now).
  Command line parameter is
  --syzygy-paths=/path/to/syzygy/
* Old smart pruning flag is gone. Instead there is
  --futile-search-aversion flag.
  --futile-search-aversion=0 is equivalent to old --no-smart-pruning.
  --futile-search-aversion=1 is equivalent to old --smart-pruning.
  Now default is 1.47, which means that engine will sometimes decide to
  stop search earlier even when there is theoretical chance (but not very
  probable) that best move decision could be changed if allowed to think more.
* Lc0 now supports configuration files. Options can be listed there instead of
  command line flags / uci params.
  Config should be named lc0.config and located in the same directory as lc0.
  Should list one command line option per line, with '--' in the beginning
  being optional, for example:

     syzygy-paths=/path/to/syzygy/

* In uci info, "depth" is now average depth rather than full depth
  (which was 4 all the time).
  Also, depth values do not include reused tree, only nodes visited during the
  current search session.
* --sticky-checkmates experimental flag (default off), supposed to find shorter
  checkmate sequences.
* More features in backend "check".


Performance optimizations:
* Release windows executables are built with "whole program optimization".
* Added --out-of-order-eval flag (default is off).
  Switching it on makes cached/terminal nodes higher priority, which increases
  nps.
* OpenCL backend now supports batches (up to 5x speedup!)
* Performance optimizations for BLAS backend.
* Total visited policy (for FPU reduction) is now cached.
* Values of priors (P) are stored now as 16-bit float rather than 32-bit float,
  that saves considerable amount of RAM.


Bugfixes:
* Fixed en passant detection bug which caused the position after pawn moving by
  two squares not counted towards threefold repetition even if en passant was
  not possible.
* Fixed the bug which caused --cache-history-length for values 2..7 work the
  same as --cache-history-length=1.
  This is fixed, but default is temporarily changed to --cache-history-length=1
  during play. (For training games, it's 7)


Removed features:
* Backpropagation beta / backpropagation gamma parameters have been removed.


Other changes:
* Release lc0-windows-cuda.zip package now contains NVdia CUDA and cuDNN .dlls.


v0.16.0 (2018-07-20)
~~~~~~~

* Fully switched to official releases! No more https://crem.xyz/lc0/
* Fixed a bug when pv display and smart pruning didn't sometimes work properly
  after tree reuse.
* Format of protobuf network files was changed.
* Autodiscovery of protobuf based network files works now.


lc0-win-20180715-cuda92-cudnn714
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Support of new format of network files (needed for lc0 launch on main
  training server)
* Fixed hang/poor performance in the beginning of search when there are many
  threads. (Happened on linux only though).
* Memory footprint is reduced a bit. (~-60 bytes per node)

lc0-win-20180711-cuda92-cudnn714
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Edge-node separation introduced a bug that smart pruning didn't work. That's
  fixed.
* Changed options parsing so that --backend-opts=cudnn-fp16 is now possible.
* Performance fixes (mostly for slowness introduced by edge-node separation).

lc0-win-20180708-cuda92-cudnn714
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Mutex contention has been reduced (but locking mutex more rarely).
  Helps a lot with many threads running. Especially recommended to check with
  multi-GPU configuration.
* Memory usage reduced at least 2x (probably more).
* cudnn backend crashed on large batches (>800) that's fixed.
  There is still a limit of batch size 1024 though.
* (not in cudnn build, but for completeness)
  Fixed NN computation with BLAS backend, it had up to 5% error before that.
* Default time budgeting params have been changed again! (not by mach this time)
  --slowmover=1.95
  --time-curve-peak=26.2
  --time-curve-left-width=82
  --time-curve-right-width=74

lc0-win-20180701-cuda92-cudnn714
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* fp16-based computation for very modern NVidia GPUs!
  May reduce precision a bit, but should be compensated by nps boost.
  Enable with --backend=cudnn-fp16 flag
* V is now not stored in nodes (a bit less RAM used while thinking)
* (not in cudnn build, but listing for completeness) blas batching support.

lc0-win-20180629-cuda92-cudnn714
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Default time budgeting parameters have been changed (again!):
  --slowmover=1.93
  --time-curve-peak=26
  --time-curve-left-width=67
  --time-curve-right-width=76
* When generating training games, the engine could confuse client by sending
  corrupted output. That's fixed.

lc0-win-20180624-cuda92-cudnn714
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Default time budgeting parameters have been changed:
  --slowmover=2.13  (was 1.8)
  --time-curve-peak=22.0  (was 41.0)
  --time-curve-left-width=450.0  (was 1000.0)
  --time-curve-right-width=30.0  (was 39.5)
* During training game generation, the engine is able to send resign statistics.


lc0-win-20180622-cuda92-cudnn714
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Time budged allocation has been changed, it allocates more time to early
  stages of the game.
  Graphs are here: https://github.com/LeelaChessZero/lc0/pull/59
  Slowmover value has so be recalibrated, and default value was changed from 2.2 to 1.8.
* Fixed a race condition in cache prefetch code. Realistically it hardly every
  occured before though.

lc0-win-20180619-cuda92-cudnn714-00
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Fix a bug instroduced in version 20180609 which caused the engine to miss checkmates sometimes.

lc0-win-20180614-cuda92-cudnn714
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* "go searchmoves" uci command is now supported
* It's possible now to disable tree reuse in training games
* Few improvements for random backend
* Lc0 now shows version in uci response
* Analyzer mode has been removed
* extra-virtual-loss has been removed
* Implemented resign (for training games)

lc0-win-20180609-cuda92-cudnn714-01
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* In addition to --backpropagate-gamma, there is also --backpropagate-beta!
  Default is 1.0.

lc0-win-20180609-cuda92-cudnn714-00
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Visible changes:
* Experimental changes from 20180604 are now default.
* Memory footprint is reduced by 8 bytes per visible node (+ ~240 bytes in
  invisible nodes per visible)
* Introduced --backpropagate-gamma flag.
  Default is 1.0. There are rumours that reducing it to 0.75 improves play.
* Extra-virtual-loss parameter has been removed.
* Quotes in backend-opts parameter were not parsed properly, that's fixed.


lc0-win-20180604-cuda92-cudnn714-experimental
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Visible changes:
* Experimental default settings:
  cPUCT: 3.4
  FPU reduction: 0.9
  policy Softmax: 2.2

* Fix memory leak when GUI doesn't ever issue `isready` uci command.


lc0-win-20180602-cuda92-cudnn714-00
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Visible changes:
* cPUCT is now 3.1 by default instead of 1.2 (or what it was before)
* Fixed Batch normalization epsilon in tensorflow backend (but noone uses tensorflow anyway)
* Periodically (every 5 seconds) output "uci info" even if bestmove/depth doesn't change.
* Memory management is redone so that node release happens after "bestmove" and "isready", rather than after "position" uci command.
  That garbage collection could take tens of milliseconds and chess GUI already started timer at that point.
  Memory management is always fragile, so fresh crashes and memory leaks are possible.

Invisible changes:
* Store castlings again as e1g1 and not e1h1. Fixes a bug that tree was not reused after castling.