Stockfish x2 NNUE for Android
Moderators: Elijah, Igbo, timetraveller
-
- Forum Contributions
- Points: 3 834,00
- Posts: 75
- Joined: 16/04/2023, 13:36
- Status: Offline (Active 5 Days, 16 Hours, 30 Minutes ago)
- Medals: 1
- Topics: 18
- Reputation: 108
- Has thanked: 46 times
- Been thanked: 201 times
Stockfish x2 NNUE for Android
https://tests.stockfishchess.org/tests/view/656e8a9d6980e15f69c76f2a
Stockfish can now support two nnue networks, I hope it will be the maximum number of networks that Stockfish will support. Anyway, here I made compiles for armv8 with two big nets. This has only one reason, I don't managed to found a small network that in a code, so I chose the most easiest way going in this direction. I want to give great credit to Archimedes's tool cecsa (and for SF makefile), otherwise it would be take much longer to compile it (because code had no good optimization for armv8). Last very important note: Archimedes's Makefile doesn't use funroll-loops, which results in slightly weaker builds (besides speed, it also give elo) than that posted on stockfish github page (my opinion). So, I implemented this feature also. Source code included.
https://pixeldrain.com/u/4Nyp4cBz
Stockfish can now support two nnue networks, I hope it will be the maximum number of networks that Stockfish will support. Anyway, here I made compiles for armv8 with two big nets. This has only one reason, I don't managed to found a small network that in a code, so I chose the most easiest way going in this direction. I want to give great credit to Archimedes's tool cecsa (and for SF makefile), otherwise it would be take much longer to compile it (because code had no good optimization for armv8). Last very important note: Archimedes's Makefile doesn't use funroll-loops, which results in slightly weaker builds (besides speed, it also give elo) than that posted on stockfish github page (my opinion). So, I implemented this feature also. Source code included.
https://pixeldrain.com/u/4Nyp4cBz
-
- Forum Contributions
- Points: 42 582,00
- Posts: 2059
- Joined: 04/11/2019, 21:13
- Status: Offline (Active 4 Hours, 21 Minutes ago)
- Medals: 2
- Topics: 158
- Reputation: 7111
- Been thanked: 6477 times
Stockfish x2 NNUE for Android
I always try to keep the makefiles as clean as possible (no unnecessary parameters). I have not been able to identify any advantages for GCC. Rather, I had the feeling that the performance was worse with this parameter. For this reason, I have not used the parameter. But I have not yet tested this parameter for Clang.AlexanderSanteramo wrote: ↑07/12/2023, 9:41 Last very important note: Archimedes's Makefile doesn't use funroll-loops, which results in slightly weaker builds (besides speed, it also give elo) than that posted on stockfish github page (my opinion). So, I implemented this feature also.
-
- Forum Contributions
- Points: 42 582,00
- Posts: 2059
- Joined: 04/11/2019, 21:13
- Status: Offline (Active 4 Hours, 21 Minutes ago)
- Medals: 2
- Topics: 158
- Reputation: 7111
- Been thanked: 6477 times
Stockfish x2 NNUE for Android
Just tested. I have carried out 5 benchmark tests in succession here. The average values are as follows:
Without -funroll-loops: 90120 Nodes/second
With -funroll-loops: 90011 Nodes/second
Original Stockfish from GitHub: 89376 Nodes/second
Even with Clang, it doesn't get any faster (rather slower). I can't see any advantages if you define -funroll-loops in advance. It may be different under Windows, but I don't see any benefit under Android (I would even say it's slower there with -funroll-loops).
Without -funroll-loops: 90120 Nodes/second
With -funroll-loops: 90011 Nodes/second
Original Stockfish from GitHub: 89376 Nodes/second
Even with Clang, it doesn't get any faster (rather slower). I can't see any advantages if you define -funroll-loops in advance. It may be different under Windows, but I don't see any benefit under Android (I would even say it's slower there with -funroll-loops).
-
- Forum Contributions
- Points: 42 582,00
- Posts: 2059
- Joined: 04/11/2019, 21:13
- Status: Offline (Active 4 Hours, 21 Minutes ago)
- Medals: 2
- Topics: 158
- Reputation: 7111
- Been thanked: 6477 times
Stockfish x2 NNUE for Android
And because I'm having such a good time, here are the values for GCC. As you can see, Clang has a slight advantage here.
With GCC 10.3.0: 88399 Nodes/second
Version 11 produces slightly better results. It should be at about the same level as Clang.
With GCC 10.3.0: 88399 Nodes/second
Version 11 produces slightly better results. It should be at about the same level as Clang.
-
- Forum Contributions
- Points: 3 834,00
- Posts: 75
- Joined: 16/04/2023, 13:36
- Status: Offline (Active 5 Days, 16 Hours, 30 Minutes ago)
- Medals: 1
- Topics: 18
- Reputation: 108
- Has thanked: 46 times
- Been thanked: 201 times
Stockfish x2 NNUE for Android
https://tests.stockfishchess.org/tests/live_elo/64d944815b17f7c21c0e92e1
Maybe funroll version stronger than without it? Tests give +3 elo. Maybe on android there would be not much difference?
Maybe funroll version stronger than without it? Tests give +3 elo. Maybe on android there would be not much difference?
-
- Forum Contributions
- Points: 42 582,00
- Posts: 2059
- Joined: 04/11/2019, 21:13
- Status: Offline (Active 4 Hours, 21 Minutes ago)
- Medals: 2
- Topics: 158
- Reputation: 7111
- Been thanked: 6477 times
Stockfish x2 NNUE for Android
I now know why using -funroll-loops doesn't produce better results. I have compared two completely identical files.
By adding -funroll-loops I get the same executable file as without.
Normally, when I test a (supposedly) new parameter, I first check whether the generated file is a different one. I don't know why I didn't do that here.
Bottom line is, -funroll-loops is already enabled by default, with the parameters I use to compile Stockfish. So it does not have to be specified in the makefile.
By adding -funroll-loops I get the same executable file as without.
Normally, when I test a (supposedly) new parameter, I first check whether the generated file is a different one. I don't know why I didn't do that here.
Bottom line is, -funroll-loops is already enabled by default, with the parameters I use to compile Stockfish. So it does not have to be specified in the makefile.
-
- Forum Contributions
- Points: 3 834,00
- Posts: 75
- Joined: 16/04/2023, 13:36
- Status: Offline (Active 5 Days, 16 Hours, 30 Minutes ago)
- Medals: 1
- Topics: 18
- Reputation: 108
- Has thanked: 46 times
- Been thanked: 201 times
Stockfish x2 NNUE for Android
Since we're talking about engine compilation, I'll just want to point out one thing. There is flag that placed near with funroll-loops, I talk about O3 flag (you can google if you don't know about it, but here's is answer
directs the compiler to be aggressive about the optimization techniques used and to use as much memory as necessary for maximum optimization) which is not best variation. I tested this flag against more agressive Ofast flag and managed to got elo gain over O3. I tested it on control 10+0.1s, my assumption that difference will be smaller, if you increase control. But the key, is that build with Ofast is faster than others and compilation time also faster.
directs the compiler to be aggressive about the optimization techniques used and to use as much memory as necessary for maximum optimization) which is not best variation. I tested this flag against more agressive Ofast flag and managed to got elo gain over O3. I tested it on control 10+0.1s, my assumption that difference will be smaller, if you increase control. But the key, is that build with Ofast is faster than others and compilation time also faster.
-
- Forum Contributions
- Points: 42 582,00
- Posts: 2059
- Joined: 04/11/2019, 21:13
- Status: Offline (Active 4 Hours, 21 Minutes ago)
- Medals: 2
- Topics: 158
- Reputation: 7111
- Been thanked: 6477 times
Stockfish x2 NNUE for Android
-Ofast also includes the optimizations made with -O3 plus other optimizations (which may not always be good).AlexanderSanteramo wrote: ↑09/12/2023, 17:43 Since we're talking about engine compilation, I'll just want to point out one thing. There is flag that placed near with funroll-loops, I talk about O3 flag (you can google if you don't know about it, but here's is answer
directs the compiler to be aggressive about the optimization techniques used and to use as much memory as necessary for maximum optimization) which is not best variation. I tested this flag against more agressive Ofast flag and managed to got elo gain over O3. I tested it on control 10+0.1s, my assumption that difference will be smaller, if you increase control. But the key, is that build with Ofast is faster than others and compilation time also faster.
-O3 is already an aggressive switch, but works perfectly for chess engines, as we know. It does a lot more than -O2 (default in linux distributions, as far as I know). -Ofast seems a bit too aggressive to me. I would say it's not worth it. Needs to be tested. I'd rather wait until -Ofast makes its way into the stockfish makefile (after extensive testing).
There are a number of other compiler switches that I have already tried, but which have led to unstable results (partial crashes). Since I distribute the executable files, I am interested in stable files that are as compatible as possible. And of course as fast as possible. It is quite possible that the last bit of speed is missing. However, I am always interested in compiler switches that can be used without hesitation.
-
- Forum Contributions
- Points: 42 582,00
- Posts: 2059
- Joined: 04/11/2019, 21:13
- Status: Offline (Active 4 Hours, 21 Minutes ago)
- Medals: 2
- Topics: 158
- Reputation: 7111
- Been thanked: 6477 times
Stockfish x2 NNUE for Android
While adding -funroll-loops under Clang does nothing (executable file does not change), adding -funroll-loops under GCC (tested with GCC 13.2.0) changes the executable file. So I did another test with and without -funroll-loops. It remains the same, -funroll-loops reduces the number of nodes per second. Not much, but still measurable. Rolling out loops does not seem to help here.
-
- Forum Contributions
- Points: 3 834,00
- Posts: 75
- Joined: 16/04/2023, 13:36
- Status: Offline (Active 5 Days, 16 Hours, 30 Minutes ago)
- Medals: 1
- Topics: 18
- Reputation: 108
- Has thanked: 46 times
- Been thanked: 201 times
Stockfish x2 NNUE for Android
So, no difference at all and a much reduced speedArchimedes wrote: ↑12/12/2023, 8:25 So I did another test with and without -funroll-loops. It remains the same
-
- Forum Contributions
- Points: 42 582,00
- Posts: 2059
- Joined: 04/11/2019, 21:13
- Status: Offline (Active 4 Hours, 21 Minutes ago)
- Medals: 2
- Topics: 158
- Reputation: 7111
- Been thanked: 6477 times
Stockfish x2 NNUE for Android
Finally, I also ran a quick test for the questionable parameter -Ofast (although I have done this before). I ran a total of 3 benchmarks and calculated the average.
With -O3: 94305 Nodes/second
With -Ofast: 94221 Nodes/second
The parameter doesn't seem to help anyway.
With -O3: 94305 Nodes/second
With -Ofast: 94221 Nodes/second
The parameter doesn't seem to help anyway.
-
- Forum Contributions
- Points: 33 205,00
- Posts: 325
- Joined: 11/11/2022, 1:55
- Status: Offline (Active 1 Day, 2 Hours ago)
- Medals: 3
- Topics: 6
- Reputation: 3125
- Location: 3th dimension.
- Has thanked: 5177 times
- Been thanked: 2350 times
Stockfish x2 NNUE for Android
Hi, question, on which patch this engine was compiled? I mean day and author. Tnx.AlexanderSanteramo wrote: ↑07/12/2023, 9:41 https://tests.stockfishchess.org/tests/view/656e8a9d6980e15f69c76f2a
Stockfish can now support two nnue networks, I hope it will be the maximum number of networks that Stockfish will support. Anyway, here I made compiles for armv8 with two big nets. This has only one reason, I don't managed to found a small network that in a code, so I chose the most easiest way going in this direction. I want to give great credit to Archimedes's tool cecsa (and for SF makefile), otherwise it would be take much longer to compile it (because code had no good optimization for armv8). Last very important note: Archimedes's Makefile doesn't use funroll-loops, which results in slightly weaker builds (besides speed, it also give elo) than that posted on stockfish github page (my opinion). So, I implemented this feature also. Source code included.
https://pixeldrain.com/u/4Nyp4cBz
"Work smart - not hard."
-
- Chief moderators
- Points: 20 167,00
- Forum Contributions
- Posts: 227
- Joined: 14/08/2022, 20:09
- Status: Offline (Active 4 Hours, 6 Minutes ago)
- Medals: 1
- Topics: 8
- Reputation: 1784
- Has thanked: 360 times
- Been thanked: 1462 times
Stockfish x2 NNUE for Android
I am fairly certain Linrock is the author of that commit (December 4, 2023).
https://github.com/linrock/Stockfish/commit/64237596a0eaf37630a71b3e4a636a9a42be5058
https://github.com/linrock/Stockfish/tree/64237596a0eaf37630a71b3e4a636a9a42be5058
He has multiple Dual NNUE branches on his Github.
https://github.com/linrock/Stockfish
Edit: examined the source. The only significant change was to use a dimension 2560 net (nn-ac1dbea57aa3.nnue 23-09-13) in place of Linrock's 256 net (nn-ecb35f70ff2a.nnue 23-07-18). The constexpr was modified in nnue_architecture.h for the change.
https://github.com/linrock/Stockfish/commit/64237596a0eaf37630a71b3e4a636a9a42be5058
https://github.com/linrock/Stockfish/tree/64237596a0eaf37630a71b3e4a636a9a42be5058
He has multiple Dual NNUE branches on his Github.
https://github.com/linrock/Stockfish
Edit: examined the source. The only significant change was to use a dimension 2560 net (nn-ac1dbea57aa3.nnue 23-09-13) in place of Linrock's 256 net (nn-ecb35f70ff2a.nnue 23-07-18). The constexpr was modified in nnue_architecture.h for the change.
-
- Forum Contributions
- Points: 3 834,00
- Posts: 75
- Joined: 16/04/2023, 13:36
- Status: Offline (Active 5 Days, 16 Hours, 30 Minutes ago)
- Medals: 1
- Topics: 18
- Reputation: 108
- Has thanked: 46 times
- Been thanked: 201 times