Toolkit to train a net without gensfen nor selfplay
Moderators: Elijah, Igbo, timetraveller
-
- Forum Contributions
- Points: 7 463,00
- Posts: 92
- Joined: 04/11/2019, 13:44
- Status: Offline (Active 1 Month, 16 Hours, 26 Minutes ago)
- Topics: 10
- Reputation: 12
- Location: Turkey
- Has thanked: 4 times
- Been thanked: 41 times
Toolkit to train a net without gensfen nor selfplay
Hi,
I have several questions. Can we change the text (sample_plain for me) without any damage? I see some wrong evaluations in selected depth and I'm trying to fix lots of wrong evaluations in my plain text for creating NNUE with better data. I am asking this because I don't know what MD5 files are for.
For secong, I didn't understand how to use nodchip's releases. Can you explain me every step after finishing data creation from pgns?
I have several questions. Can we change the text (sample_plain for me) without any damage? I see some wrong evaluations in selected depth and I'm trying to fix lots of wrong evaluations in my plain text for creating NNUE with better data. I am asking this because I don't know what MD5 files are for.
For secong, I didn't understand how to use nodchip's releases. Can you explain me every step after finishing data creation from pgns?
-
- I've been banned!
- Points: 6 000,00
- Posts: 246
- Joined: 08/11/2019, 7:32
- Status: Offline (Active 1 Year, 9 Months, 2 Weeks, 14 Hours, 32 Minutes ago)
- Topics: 12
- Reputation: 218
- Location: France
- Been thanked: 288 times
Toolkit to train a net without gensfen nor selfplay
Sorry if i don't understand well enough...
I think if your PLAIN TEXT files don't follow the default's hardcoded schema "xxx_plain.txt", this tool's collection won't find them.
But when you want to store them, you can rename them.
Each evaluated/extracted EPDs are hashed by a MD5 algorithm to avoid to store several times the same EPDs in the PLAIN TEXT files.
The four MD5 files contain the hashes of all the EPDs which were previously evaluated, extracted ou cleaned.
This tool's collection only deals with the PLAIN TEXT format. By this way, we can read the data directly and we aren't screwed in a proprietary trainable format (nnue bin, pytorch pt, nnue binpack, etc.). So if you want to train a NNUE compliant net, you have to convert your PLAIN TEXT files into a NNUE BIN / BINPACK file by using the nodchip releases.
Maybe these links will help you :
https://github.com/nodchip/Stockfish#training-a-network
https://github.com/nodchip/Stockfish/tree/master/docs
I think if your PLAIN TEXT files don't follow the default's hardcoded schema "xxx_plain.txt", this tool's collection won't find them.
But when you want to store them, you can rename them.
Each evaluated/extracted EPDs are hashed by a MD5 algorithm to avoid to store several times the same EPDs in the PLAIN TEXT files.
The four MD5 files contain the hashes of all the EPDs which were previously evaluated, extracted ou cleaned.
This tool's collection only deals with the PLAIN TEXT format. By this way, we can read the data directly and we aren't screwed in a proprietary trainable format (nnue bin, pytorch pt, nnue binpack, etc.). So if you want to train a NNUE compliant net, you have to convert your PLAIN TEXT files into a NNUE BIN / BINPACK file by using the nodchip releases.
Maybe these links will help you :
https://github.com/nodchip/Stockfish#training-a-network
https://github.com/nodchip/Stockfish/tree/master/docs
-
- Forum Contributions
- Points: 7 463,00
- Posts: 92
- Joined: 04/11/2019, 13:44
- Status: Offline (Active 1 Month, 16 Hours, 26 Minutes ago)
- Topics: 10
- Reputation: 12
- Location: Turkey
- Has thanked: 4 times
- Been thanked: 41 times
Toolkit to train a net without gensfen nor selfplay
Thanks a lot. I don't trust my English. Let me try to make you understand.deeds wrote:Sorry if i don't understand well enough...
I think if your PLAIN TEXT files don't follow the default's hardcoded schema "xxx_plain.txt", this tool's collection won't find them.
But when you want to store them, you can rename them.
Each evaluated/extracted EPDs are hashed by a MD5 algorithm to avoid to store several times the same EPDs in the PLAIN TEXT files.
The four MD5 files contain the hashes of all the EPDs which were previously evaluated, extracted ou cleaned.
This tool's collection only deals with the PLAIN TEXT format. By this way, we can read the data directly and we aren't screwed in a proprietary trainable format (nnue bin, pytorch pt, nnue binpack, etc.). So if you want to train a NNUE compliant net, you have to convert your PLAIN TEXT files into a NNUE BIN / BINPACK file by using the nodchip releases.
Maybe these links will help you :
https://github.com/nodchip/Stockfish#training-a-network
https://github.com/nodchip/Stockfish/tree/master/docs
Here is what I did: Setting my PGN for analyzing, The Process of Analyzing. Finishing the Creation of xxx_plain.txt and MD5 files.
Here is what I am planning to do: Changing some scores and moves by hand which I think misevaluated in xxx_plain.txt and completing the rest of training nnue.
Right now I am exactly at this spot.
So should I understand that I can delete them (MD5 files) safely before I train network? Is there any usage of MD5's in the rest?
I was asking that if I find a position in xxx_plain.txt with searching for a fen, I'm asking that can I change the score and best move of the position by hand. That was my plan but the unexpected MD5 files made me think there may be a problem. Is this causes any technical problems while I train my net?
-
- I've been banned!
- Points: 6 000,00
- Posts: 246
- Joined: 08/11/2019, 7:32
- Status: Offline (Active 1 Year, 9 Months, 2 Weeks, 14 Hours, 32 Minutes ago)
- Topics: 12
- Reputation: 218
- Location: France
- Been thanked: 288 times
Toolkit to train a net without gensfen nor selfplay
If you don't plan to add more data to your dataset, you can delete the MD5 files. And if you change your mind you can still rebuild them thanks to nnue_clean. To train a net you only need some NNUE BIN files (nodchip-like).
The four MD5 files are usefull when you append several PLAIN TEXT files which may contain duplicated EPDs.
Yes you can manually change data into the PLAIN TEXT file too.
I think during the training phase there are ways to avoid duplicated EPDs too. Maybe some options in the LEARN comand...
The four MD5 files are usefull when you append several PLAIN TEXT files which may contain duplicated EPDs.
Yes you can manually change data into the PLAIN TEXT file too.
I think during the training phase there are ways to avoid duplicated EPDs too. Maybe some options in the LEARN comand...
-
- Forum Contributions
- Points: 7 463,00
- Posts: 92
- Joined: 04/11/2019, 13:44
- Status: Offline (Active 1 Month, 16 Hours, 26 Minutes ago)
- Topics: 10
- Reputation: 12
- Location: Turkey
- Has thanked: 4 times
- Been thanked: 41 times
Toolkit to train a net without gensfen nor selfplay
I managed to convert text file to bin file for training. So how can I complete the training?
Please share the "whole direct training command that I can copy paste into Nodchip SF at once" for me. I have checked everything before.
Please share the "whole direct training command that I can copy paste into Nodchip SF at once" for me. I have checked everything before.
-
- I've been banned!
- Points: 6 000,00
- Posts: 246
- Joined: 08/11/2019, 7:32
- Status: Offline (Active 1 Year, 9 Months, 2 Weeks, 14 Hours, 32 Minutes ago)
- Topics: 12
- Reputation: 218
- Location: France
- Been thanked: 288 times
Toolkit to train a net without gensfen nor selfplay
But the complete command is here :
https://github.com/nodchip/Stockfish#training-a-network
https://github.com/nodchip/Stockfish#training-a-network
-
- Forum Contributions
- Points: 6 000,00
- Posts: 130
- Joined: 23/11/2019, 19:48
- Status: Offline (Active 1 Year, 7 Months, 1 Week, 6 Days, 8 Hours, 41 Minutes ago)
- Topics: 2
- Reputation: 196
- Has thanked: 16 times
- Been thanked: 204 times
Toolkit to train a net without gensfen nor selfplay
Is there someone here who have generator to build nnue
-
- Forum Contributions
- Points: 6 000,00
- Posts: 130
- Joined: 23/11/2019, 19:48
- Status: Offline (Active 1 Year, 7 Months, 1 Week, 6 Days, 8 Hours, 41 Minutes ago)
- Topics: 2
- Reputation: 196
- Has thanked: 16 times
- Been thanked: 204 times
Toolkit to train a net without gensfen nor selfplay
Hi deeds,deeds wrote:Second tool => nnue_extract : https://mega.nz/folder/2ohFXYqY#JTLEKhVjypvyTRI5zFPenQ
ENJOY !
what type of pgn that is used in this tool
thx
-
- I've been banned!
- Points: 6 000,00
- Posts: 246
- Joined: 08/11/2019, 7:32
- Status: Offline (Active 1 Year, 9 Months, 2 Weeks, 14 Hours, 32 Minutes ago)
- Topics: 12
- Reputation: 218
- Location: France
- Been thanked: 288 times
Toolkit to train a net without gensfen nor selfplay
nnue_extract works with commented pgn (with score, depth, etc.). This tool will ask you if the scores are from white pov or not.
-
- Administrators
- Points: 7 707,00
- Forum Contributions
- Posts: 149
- Joined: 05/01/2021, 15:29
- Status: Offline (Active 8 Hours, 11 Minutes ago)
- Medals: 1
- Topics: 6
- Reputation: 252
- Location: Madrid, ES
- Has thanked: 64 times
- Been thanked: 319 times
Toolkit to train a net without gensfen nor selfplay
For those who want to train nets, I suggest compilijg the latest build from nldchip repository according with your PC'S Arch. After that, in the nodchip stockfish github, you have the commands to train a net, but they dont need to be the same, you can change some values like eval_save_interval for example. If you have any doubts ask and I will try to help.
-
- I've been banned!
- Points: 6 000,00
- Posts: 246
- Joined: 08/11/2019, 7:32
- Status: Offline (Active 1 Year, 9 Months, 2 Weeks, 14 Hours, 32 Minutes ago)
- Topics: 12
- Reputation: 218
- Location: France
- Been thanked: 288 times
Toolkit to train a net without gensfen nor selfplay
Here are some statistics concerning the PLAIN TEXT data extracted from different sources.
In parentheses, the minimum/maximum values encountered previously.
TCEC S01 to S18 + CUP 1 to CUP 6 (180 015 Ko)
In parentheses, the minimum/maximum values encountered previously.
TCEC S01 to S18 + CUP 1 to CUP 6 (180 015 Ko)
ICCF until 2020 (5 043 123 Ko)Games = 14 391
EPD = 1 777 012
epd/game = min. 1 (1), avg. 118 (124), max. 207 (398)
EPDStringLength = min. 33 (32), max. 81 (83)
PlainTextBlocSize = min. 67 (66), max. 116 (120)
score = min. -304,22, max. +234,60
Plies = min. 3 (1), max. 400 (400)
max50 = 99 coups (99)
PieceCount = min. 3 (2), avg. 19 (22), max. 32 (32)
MaterialImbalance = min. -19 (-34), max. 26 (33)
FCP until 2016 (7 228 901 Ko)Games = 510 936
EPD = 51 369 036
epd/game = min. 1 (1), avg. 101 (124), max. 383 (398)
EPDStringLength = min. 33 (32), max. 83 (83)
PlainTextBlocSize = min. 66 (66), max. 119 (120)
score = min. -318,57, max. +262,08
Plies = min. 3 (1), max. 400 (400)
max50 = 99 coups (99)
PieceCount = min. 3 (2), avg. 16 (22), max. 32 (32)
MaterialImbalance = min. -28 (-34), max. 30 (33)
CCRL 40/40 until 2011 (13 005 269 Ko)Games = 535 253
EPD = 74 333 305
epd/game = min. 1 (1), avg. 139 (124), max. 218 (398)
EPDStringLength = min. 32 (32), max. 82 (83)
PlainTextBlocSize = min. 66 (66), max. 118 (120)
score = min. -318,51, max. +248,06
Plies = min. 3 (1), max. 400 (400)
max50 = 99 coups (99)
PieceCount = min. 2 (2), avg. 15 (22), max. 32 (32)
MaterialImbalance = min. -26 (-34), max. 25 (33)
KINGBASE (18 789 837 Ko)Games = 1 164 329
EPD = 130 154 911
epd/game = min. 1 (1), avg. 112 (124), max. 398 (398)
EPDStringLength = min. 32 (32), max. 83 (83)
PlainTextBlocSize = min. 66 (66), max. 118 (120)
score = min. -306,82, max. +251,33
Plies = min. 1 (1), max. 400 (400)
max50 = 99 coups (99)
PieceCount = min. 2 (2), avg. 18 (22), max. 32 (32)
MaterialImbalance = min. -25 (-34), max. 24 (33)
Games = 1 866 024
EPD = 190 031 618
epd/game = min. 1 (1), avg. 102 (124), max. 392 (398)
EPDStringLength = min. 33 (32), max. 82 (83)
PlainTextBlocSize = min. 66 (66), max. 118 (120)
score = min. -318,74, max. +300,85
Plies = min. 2 (1), max. 400 (400)
max50 = 99 coups (99)
PieceCount = min. 3 (2), avg. 17 (22), max. 32 (32)
MaterialImbalance = min. -30 (-34), max. 29 (33)
-
- I've been banned!
- Points: 6 000,00
- Posts: 246
- Joined: 08/11/2019, 7:32
- Status: Offline (Active 1 Year, 9 Months, 2 Weeks, 14 Hours, 32 Minutes ago)
- Topics: 12
- Reputation: 218
- Location: France
- Been thanked: 288 times
Toolkit to train a net without gensfen nor selfplay
Tip for the nnue_eval / nnue_extract tools
If the adjudication rules were too aggressive (human or fishtest PGNs), the games may contain few plies and the percent of duplicated EPD will increase, then your net may lack some endgame knowledges.
In this case, I often used cutechess to complete the short games.
The principle is to use each PGN file as an opening book and ask engines to play until the end.
Example with a PGN file which contains 3000 short games :
If the adjudication rules were too aggressive (human or fishtest PGNs), the games may contain few plies and the percent of duplicated EPD will increase, then your net may lack some endgame knowledges.
In this case, I often used cutechess to complete the short games.
The principle is to use each PGN file as an opening book and ask engines to play until the end.
Example with a PGN file which contains 3000 short games :
At the end, the sample_finished.pgn will contain the same plies as the original games + the new plies until the end game.cutechess-cli -engine conf="engine1" -engine conf="engine2" -each option.Hash=128 depth=8 tc=inf -games 3000 -openings file="sample.pgn" -pgnout "sample_finished.pgn" min fi -recover -concurrency 12 -maxmoves 200 -draw movenumber=40 movecount=5 score=10 -tb "C:\Syzygy" -tbpieces 6 -ratinginterval 100
-
- Forum Contributions
- Points: 7 463,00
- Posts: 92
- Joined: 04/11/2019, 13:44
- Status: Offline (Active 1 Month, 16 Hours, 26 Minutes ago)
- Topics: 10
- Reputation: 12
- Location: Turkey
- Has thanked: 4 times
- Been thanked: 41 times
Toolkit to train a net without gensfen nor selfplay
Well, I can always give you a text with full of King & Pawn endgames analysed with Depth 20 of Eman.(The reason I selected Eman was that experiencing makes engine's evals more stabile and I thought it might be useful.)deeds wrote:Tip for the nnue_eval / nnue_extract tools
If the adjudication rules were too aggressive (human or fishtest PGNs), the games may contain few plies and the percent of duplicated EPD will increase, then your net may lack some endgame knowledges.
In this case, I often used cutechess to complete the short games.
The principle is to use each PGN file as an opening book and ask engines to play until the end.
Example with a PGN file which contains 3000 short games :At the end, the sample_finished.pgn will contain the same plies as the original games + the new plies until the end game.cutechess-cli -engine conf="engine1" -engine conf="engine2" -each option.Hash=128 depth=8 tc=inf -games 3000 -openings file="sample.pgn" -pgnout "sample_finished.pgn" min fi -recover -concurrency 12 -maxmoves 200 -draw movenumber=40 movecount=5 score=10 -tb "C:\Syzygy" -tbpieces 6 -ratinginterval 100
-
- I've been banned!
- Points: 6 000,00
- Posts: 246
- Joined: 08/11/2019, 7:32
- Status: Offline (Active 1 Year, 9 Months, 2 Weeks, 14 Hours, 32 Minutes ago)
- Topics: 12
- Reputation: 218
- Location: France
- Been thanked: 288 times
Toolkit to train a net without gensfen nor selfplay
Thank you but i have even several databases with endgame data :moonstonelight wrote:Well, I can always give you a text with full of King & Pawn endgames analysed with Depth 20 of Eman.(The reason I selected Eman was that experiencing makes engine's evals more stabile and I thought it might be useful.)
7men
8men
and thanks to the cutechess tip, i get an average rate of 150 positions/game