Toolkit to train a net without gensfen nor selfplay

moonstonelight · Post by **moonstonelight** » 25/01/2021, 14:49

Hi,
I have several questions. Can we change the text (sample_plain for me) without any damage? I see some wrong evaluations in selected depth and I'm trying to fix lots of wrong evaluations in my plain text for creating NNUE with better data. I am asking this because I don't know what MD5 files are for.
For secong, I didn't understand how to use nodchip's releases. Can you explain me every step after finishing data creation from pgns?

deeds · Post by **deeds** » 25/01/2021, 17:34

Sorry if i don't understand well enough...

I think if your PLAIN TEXT files don't follow the default's hardcoded schema "xxx_plain.txt", this tool's collection won't find them.
But when you want to store them, you can rename them.

Each evaluated/extracted EPDs are hashed by a MD5 algorithm to avoid to store several times the same EPDs in the PLAIN TEXT files.
The four MD5 files contain the hashes of all the EPDs which were previously evaluated, extracted ou cleaned.

This tool's collection only deals with the PLAIN TEXT format. By this way, we can read the data directly and we aren't screwed in a proprietary trainable format (nnue bin, pytorch pt, nnue binpack, etc.). So if you want to train a NNUE compliant net, you have to convert your PLAIN TEXT files into a NNUE BIN / BINPACK file by using the nodchip releases.

Maybe these links will help you :
https://github.com/nodchip/Stockfish#training-a-network
https://github.com/nodchip/Stockfish/tree/master/docs

moonstonelight · Post by **moonstonelight** » 25/01/2021, 18:14

deeds wrote:Sorry if i don't understand well enough...

I think if your PLAIN TEXT files don't follow the default's hardcoded schema "xxx_plain.txt", this tool's collection won't find them.
But when you want to store them, you can rename them.

Each evaluated/extracted EPDs are hashed by a MD5 algorithm to avoid to store several times the same EPDs in the PLAIN TEXT files.
The four MD5 files contain the hashes of all the EPDs which were previously evaluated, extracted ou cleaned.

This tool's collection only deals with the PLAIN TEXT format. By this way, we can read the data directly and we aren't screwed in a proprietary trainable format (nnue bin, pytorch pt, nnue binpack, etc.). So if you want to train a NNUE compliant net, you have to convert your PLAIN TEXT files into a NNUE BIN / BINPACK file by using the nodchip releases.

Maybe these links will help you :
https://github.com/nodchip/Stockfish#training-a-network
https://github.com/nodchip/Stockfish/tree/master/docs

Thanks a lot. I don't trust my English. Let me try to make you understand.
Here is what I did: Setting my PGN for analyzing, The Process of Analyzing. Finishing the Creation of xxx_plain.txt and MD5 files.
Here is what I am planning to do: Changing some scores and moves by hand which I think misevaluated in xxx_plain.txt and completing the rest of training nnue.
Right now I am exactly at this spot.
So should I understand that I can delete them (MD5 files) safely before I train network? Is there any usage of MD5's in the rest?
I was asking that if I find a position in xxx_plain.txt with searching for a fen, I'm asking that can I change the score and best move of the position by hand. That was my plan but the unexpected MD5 files made me think there may be a problem. Is this causes any technical problems while I train my net?

deeds · Post by **deeds** » 25/01/2021, 18:31

If you don't plan to add more data to your dataset, you can delete the MD5 files. And if you change your mind you can still rebuild them thanks to nnue_clean. To train a net you only need some NNUE BIN files (nodchip-like).

The four MD5 files are usefull when you append several PLAIN TEXT files which may contain duplicated EPDs.

Yes you can manually change data into the PLAIN TEXT file too.

I think during the training phase there are ways to avoid duplicated EPDs too. Maybe some options in the LEARN comand...

moonstonelight · Post by **moonstonelight** » 25/01/2021, 19:44

I managed to convert text file to bin file for training. So how can I complete the training?
Please share the "whole direct training command that I can copy paste into Nodchip SF at once" for me. I have checked everything before.

deeds · Post by **deeds** » 25/01/2021, 19:59

But the complete command is here :
https://github.com/nodchip/Stockfish#training-a-network

hazsan88 · Post by **hazsan88** » 25/01/2021, 23:55

Is there someone here who have generator to build nnue

hazsan88 · Post by **hazsan88** » 26/01/2021, 0:14

deeds wrote:Second tool => nnue_extract : https://mega.nz/folder/2ohFXYqY#JTLEKhVjypvyTRI5zFPenQ

ENJOY !

Hi deeds,
what type of pgn that is used in this tool

thx

deeds · Post by **deeds** » 26/01/2021, 5:17

nnue_extract works with commented pgn (with score, depth, etc.). This tool will ask you if the scores are from white pov or not.

Post by **IbaiBuR** » 26/01/2021, 6:59

For those who want to train nets, I suggest compilijg the latest build from nldchip repository according with your PC'S Arch. After that, in the nodchip stockfish github, you have the commands to train a net, but they dont need to be the same, you can change some values like eval_save_interval for example. If you have any doubts ask and I will try to help.

deeds · Post by **deeds** » 26/01/2021, 19:04

Optional tool => plain_stats : https://mega.nz/folder/nwY2VTBA#amivGNttwxqFnvrtVRGYXg

ENJOY !

deeds · Post by **deeds** » 28/01/2021, 18:58

Here are some statistics concerning the PLAIN TEXT data extracted from different sources.
In parentheses, the minimum/maximum values encountered previously.

TCEC S01 to S18 + CUP 1 to CUP 6 (180 015 Ko)

Games = 14 391
EPD = 1 777 012
epd/game = min. 1 (1), avg. 118 (124), max. 207 (398)

EPDStringLength = min. 33 (32), max. 81 (83)
PlainTextBlocSize = min. 67 (66), max. 116 (120)

score = min. -304,22, max. +234,60
Plies = min. 3 (1), max. 400 (400)
max50 = 99 coups (99)

PieceCount = min. 3 (2), avg. 19 (22), max. 32 (32)
MaterialImbalance = min. -19 (-34), max. 26 (33)

ICCF until 2020 (5 043 123 Ko)

Games = 510 936
EPD = 51 369 036
epd/game = min. 1 (1), avg. 101 (124), max. 383 (398)

EPDStringLength = min. 33 (32), max. 83 (83)
PlainTextBlocSize = min. 66 (66), max. 119 (120)

score = min. -318,57, max. +262,08
Plies = min. 3 (1), max. 400 (400)
max50 = 99 coups (99)

PieceCount = min. 3 (2), avg. 16 (22), max. 32 (32)
MaterialImbalance = min. -28 (-34), max. 30 (33)

FCP until 2016 (7 228 901 Ko)

Games = 535 253
EPD = 74 333 305
epd/game = min. 1 (1), avg. 139 (124), max. 218 (398)

EPDStringLength = min. 32 (32), max. 82 (83)
PlainTextBlocSize = min. 66 (66), max. 118 (120)

score = min. -318,51, max. +248,06
Plies = min. 3 (1), max. 400 (400)
max50 = 99 coups (99)

PieceCount = min. 2 (2), avg. 15 (22), max. 32 (32)
MaterialImbalance = min. -26 (-34), max. 25 (33)

CCRL 40/40 until 2011 (13 005 269 Ko)

Games = 1 164 329
EPD = 130 154 911
epd/game = min. 1 (1), avg. 112 (124), max. 398 (398)

EPDStringLength = min. 32 (32), max. 83 (83)
PlainTextBlocSize = min. 66 (66), max. 118 (120)

score = min. -306,82, max. +251,33
Plies = min. 1 (1), max. 400 (400)
max50 = 99 coups (99)

PieceCount = min. 2 (2), avg. 18 (22), max. 32 (32)
MaterialImbalance = min. -25 (-34), max. 24 (33)

KINGBASE (18 789 837 Ko)

Games = 1 866 024
EPD = 190 031 618
epd/game = min. 1 (1), avg. 102 (124), max. 392 (398)

EPDStringLength = min. 33 (32), max. 82 (83)
PlainTextBlocSize = min. 66 (66), max. 118 (120)

score = min. -318,74, max. +300,85
Plies = min. 2 (1), max. 400 (400)
max50 = 99 coups (99)

PieceCount = min. 3 (2), avg. 17 (22), max. 32 (32)
MaterialImbalance = min. -30 (-34), max. 29 (33)

deeds · Post by **deeds** » 29/01/2021, 12:56

Tip for the nnue_eval / nnue_extract tools

If the adjudication rules were too aggressive (human or fishtest PGNs), the games may contain few plies and the percent of duplicated EPD will increase, then your net may lack some endgame knowledges.

In this case, I often used cutechess to complete the short games.

The principle is to use each PGN file as an opening book and ask engines to play until the end.

Example with a PGN file which contains 3000 short games :

cutechess-cli -engine conf="engine1" -engine conf="engine2" -each option.Hash=128 depth=8 tc=inf -games 3000 -openings file="sample.pgn" -pgnout "sample_finished.pgn" min fi -recover -concurrency 12 -maxmoves 200 -draw movenumber=40 movecount=5 score=10 -tb "C:\Syzygy" -tbpieces 6 -ratinginterval 100

At the end, the sample_finished.pgn will contain the same plies as the original games + the new plies until the end game.

moonstonelight · Post by **moonstonelight** » 29/01/2021, 17:06

deeds wrote:Tip for the nnue_eval / nnue_extract tools

If the adjudication rules were too aggressive (human or fishtest PGNs), the games may contain few plies and the percent of duplicated EPD will increase, then your net may lack some endgame knowledges.

In this case, I often used cutechess to complete the short games.

The principle is to use each PGN file as an opening book and ask engines to play until the end.

Example with a PGN file which contains 3000 short games :
cutechess-cli -engine conf="engine1" -engine conf="engine2" -each option.Hash=128 depth=8 tc=inf -games 3000 -openings file="sample.pgn" -pgnout "sample_finished.pgn" min fi -recover -concurrency 12 -maxmoves 200 -draw movenumber=40 movecount=5 score=10 -tb "C:\Syzygy" -tbpieces 6 -ratinginterval 100
At the end, the sample_finished.pgn will contain the same plies as the original games + the new plies until the end game.

Well, I can always give you a text with full of King & Pawn endgames analysed with Depth 20 of Eman.(The reason I selected Eman was that experiencing makes engine's evals more stabile and I thought it might be useful.)

deeds · Post by **deeds** » 29/01/2021, 18:53

moonstonelight wrote:Well, I can always give you a text with full of King & Pawn endgames analysed with Depth 20 of Eman.(The reason I selected Eman was that experiencing makes engine's evals more stabile and I thought it might be useful.)

Thank you but i have even several databases with endgame data :
7men

8men

and thanks to the cutechess tip, i get an average rate of 150 positions/game

Outskirts CheSS ForuM

Outskirts CheSS ForuM