Stockfish x2 NNUE for Android

Moderators: Elijah, Igbo, timetraveller

AlexanderSanteramo

Top contribute Forum
Forum Contributions
Points: 3 834,00 
Posts: 75
Joined: 16/04/2023, 13:36
Status: Offline (Active 5 Days, 16 Hours, 30 Minutes ago)
Medals: 1
Topics: 18
Reputation: 108
Has thanked: 46 times
Been thanked: 201 times

Stockfish x2 NNUE for Android

Post by AlexanderSanteramo »

https://tests.stockfishchess.org/tests/view/656e8a9d6980e15f69c76f2a
Stockfish can now support two nnue networks, I hope it will be the maximum number of networks that Stockfish will support. Anyway, here I made compiles for armv8 with two big nets. This has only one reason, I don't managed to found a small network that in a code, so I chose the most easiest way going in this direction. I want to give great credit to Archimedes's tool cecsa (and for SF makefile), otherwise it would be take much longer to compile it (because code had no good optimization for armv8). Last very important note: Archimedes's Makefile doesn't use funroll-loops, which results in slightly weaker builds (besides speed, it also give elo) than that posted on stockfish github page (my opinion). So, I implemented this feature also. Source code included.
https://pixeldrain.com/u/4Nyp4cBz
Archimedes

Android Engines Top Active Users
Forum Contributions
Points: 42 582,00 
Posts: 2059
Joined: 04/11/2019, 21:13
Status: Offline (Active 4 Hours, 21 Minutes ago)
Medals: 2
Topics: 158
Reputation: 7111
Been thanked: 6477 times

Stockfish x2 NNUE for Android

Post by Archimedes »

AlexanderSanteramo wrote: 07/12/2023, 9:41 Last very important note: Archimedes's Makefile doesn't use funroll-loops, which results in slightly weaker builds (besides speed, it also give elo) than that posted on stockfish github page (my opinion). So, I implemented this feature also.
I always try to keep the makefiles as clean as possible (no unnecessary parameters). I have not been able to identify any advantages for GCC. Rather, I had the feeling that the performance was worse with this parameter. For this reason, I have not used the parameter. But I have not yet tested this parameter for Clang.
Archimedes

Android Engines Top Active Users
Forum Contributions
Points: 42 582,00 
Posts: 2059
Joined: 04/11/2019, 21:13
Status: Offline (Active 4 Hours, 21 Minutes ago)
Medals: 2
Topics: 158
Reputation: 7111
Been thanked: 6477 times

Stockfish x2 NNUE for Android

Post by Archimedes »

Just tested. I have carried out 5 benchmark tests in succession here. The average values are as follows:

Without -funroll-loops: 90120 Nodes/second
With -funroll-loops: 90011 Nodes/second
Original Stockfish from GitHub: 89376 Nodes/second

Even with Clang, it doesn't get any faster (rather slower). I can't see any advantages if you define -funroll-loops in advance. It may be different under Windows, but I don't see any benefit under Android (I would even say it's slower there with -funroll-loops).
Archimedes

Android Engines Top Active Users
Forum Contributions
Points: 42 582,00 
Posts: 2059
Joined: 04/11/2019, 21:13
Status: Offline (Active 4 Hours, 21 Minutes ago)
Medals: 2
Topics: 158
Reputation: 7111
Been thanked: 6477 times

Stockfish x2 NNUE for Android

Post by Archimedes »

And because I'm having such a good time, here are the values for GCC. As you can see, Clang has a slight advantage here.

With GCC 10.3.0: 88399 Nodes/second

Version 11 produces slightly better results. It should be at about the same level as Clang.
AlexanderSanteramo

Top contribute Forum
Forum Contributions
Points: 3 834,00 
Posts: 75
Joined: 16/04/2023, 13:36
Status: Offline (Active 5 Days, 16 Hours, 30 Minutes ago)
Medals: 1
Topics: 18
Reputation: 108
Has thanked: 46 times
Been thanked: 201 times

Stockfish x2 NNUE for Android

Post by AlexanderSanteramo »

https://tests.stockfishchess.org/tests/live_elo/64d944815b17f7c21c0e92e1
Maybe funroll version stronger than without it? Tests give +3 elo. Maybe on android there would be not much difference?
Archimedes

Android Engines Top Active Users
Forum Contributions
Points: 42 582,00 
Posts: 2059
Joined: 04/11/2019, 21:13
Status: Offline (Active 4 Hours, 21 Minutes ago)
Medals: 2
Topics: 158
Reputation: 7111
Been thanked: 6477 times

Stockfish x2 NNUE for Android

Post by Archimedes »

I now know why using -funroll-loops doesn't produce better results. I have compared two completely identical files. laughter1

By adding -funroll-loops I get the same executable file as without.

Normally, when I test a (supposedly) new parameter, I first check whether the generated file is a different one. I don't know why I didn't do that here. smile14

Bottom line is, -funroll-loops is already enabled by default, with the parameters I use to compile Stockfish. So it does not have to be specified in the makefile.
AlexanderSanteramo

Top contribute Forum
Forum Contributions
Points: 3 834,00 
Posts: 75
Joined: 16/04/2023, 13:36
Status: Offline (Active 5 Days, 16 Hours, 30 Minutes ago)
Medals: 1
Topics: 18
Reputation: 108
Has thanked: 46 times
Been thanked: 201 times

Stockfish x2 NNUE for Android

Post by AlexanderSanteramo »

Since we're talking about engine compilation, I'll just want to point out one thing. There is flag that placed near with funroll-loops, I talk about O3 flag (you can google if you don't know about it, but here's is answer
directs the compiler to be aggressive about the optimization techniques used and to use as much memory as necessary for maximum optimization) which is not best variation. I tested this flag against more agressive Ofast flag and managed to got elo gain over O3. I tested it on control 10+0.1s, my assumption that difference will be smaller, if you increase control. But the key, is that build with Ofast is faster than others and compilation time also faster.
Archimedes

Android Engines Top Active Users
Forum Contributions
Points: 42 582,00 
Posts: 2059
Joined: 04/11/2019, 21:13
Status: Offline (Active 4 Hours, 21 Minutes ago)
Medals: 2
Topics: 158
Reputation: 7111
Been thanked: 6477 times

Stockfish x2 NNUE for Android

Post by Archimedes »

AlexanderSanteramo wrote: 09/12/2023, 17:43 Since we're talking about engine compilation, I'll just want to point out one thing. There is flag that placed near with funroll-loops, I talk about O3 flag (you can google if you don't know about it, but here's is answer
directs the compiler to be aggressive about the optimization techniques used and to use as much memory as necessary for maximum optimization) which is not best variation. I tested this flag against more agressive Ofast flag and managed to got elo gain over O3. I tested it on control 10+0.1s, my assumption that difference will be smaller, if you increase control. But the key, is that build with Ofast is faster than others and compilation time also faster.
-Ofast also includes the optimizations made with -O3 plus other optimizations (which may not always be good).

-O3 is already an aggressive switch, but works perfectly for chess engines, as we know. It does a lot more than -O2 (default in linux distributions, as far as I know). -Ofast seems a bit too aggressive to me. I would say it's not worth it. Needs to be tested. I'd rather wait until -Ofast makes its way into the stockfish makefile (after extensive testing). wink

There are a number of other compiler switches that I have already tried, but which have led to unstable results (partial crashes). Since I distribute the executable files, I am interested in stable files that are as compatible as possible. And of course as fast as possible. It is quite possible that the last bit of speed is missing. However, I am always interested in compiler switches that can be used without hesitation.
Archimedes

Android Engines Top Active Users
Forum Contributions
Points: 42 582,00 
Posts: 2059
Joined: 04/11/2019, 21:13
Status: Offline (Active 4 Hours, 21 Minutes ago)
Medals: 2
Topics: 158
Reputation: 7111
Been thanked: 6477 times

Stockfish x2 NNUE for Android

Post by Archimedes »

While adding -funroll-loops under Clang does nothing (executable file does not change), adding -funroll-loops under GCC (tested with GCC 13.2.0) changes the executable file. So I did another test with and without -funroll-loops. It remains the same, -funroll-loops reduces the number of nodes per second. Not much, but still measurable. Rolling out loops does not seem to help here.
AlexanderSanteramo

Top contribute Forum
Forum Contributions
Points: 3 834,00 
Posts: 75
Joined: 16/04/2023, 13:36
Status: Offline (Active 5 Days, 16 Hours, 30 Minutes ago)
Medals: 1
Topics: 18
Reputation: 108
Has thanked: 46 times
Been thanked: 201 times

Stockfish x2 NNUE for Android

Post by AlexanderSanteramo »

Archimedes wrote: 12/12/2023, 8:25 So I did another test with and without -funroll-loops. It remains the same
So, no difference at all and a much reduced speed
Archimedes

Android Engines Top Active Users
Forum Contributions
Points: 42 582,00 
Posts: 2059
Joined: 04/11/2019, 21:13
Status: Offline (Active 4 Hours, 21 Minutes ago)
Medals: 2
Topics: 158
Reputation: 7111
Been thanked: 6477 times

Stockfish x2 NNUE for Android

Post by Archimedes »

Finally, I also ran a quick test for the questionable parameter -Ofast (although I have done this before). I ran a total of 3 benchmarks and calculated the average.

With -O3: 94305 Nodes/second
With -Ofast: 94221 Nodes/second

The parameter doesn't seem to help anyway.
SkyNet

Top contribute Forum Engines Maker Book Maker
Forum Contributions
Points: 33 205,00 
Posts: 325
Joined: 11/11/2022, 1:55
Status: Offline (Active 1 Day, 2 Hours ago)
Medals: 3
Topics: 6
Reputation: 3125
Location: 3th dimension.
Has thanked: 5177 times
Been thanked: 2350 times

Stockfish x2 NNUE for Android

Post by SkyNet »

AlexanderSanteramo wrote: 07/12/2023, 9:41 https://tests.stockfishchess.org/tests/view/656e8a9d6980e15f69c76f2a
Stockfish can now support two nnue networks, I hope it will be the maximum number of networks that Stockfish will support. Anyway, here I made compiles for armv8 with two big nets. This has only one reason, I don't managed to found a small network that in a code, so I chose the most easiest way going in this direction. I want to give great credit to Archimedes's tool cecsa (and for SF makefile), otherwise it would be take much longer to compile it (because code had no good optimization for armv8). Last very important note: Archimedes's Makefile doesn't use funroll-loops, which results in slightly weaker builds (besides speed, it also give elo) than that posted on stockfish github page (my opinion). So, I implemented this feature also. Source code included.
https://pixeldrain.com/u/4Nyp4cBz
Hi, question, on which patch this engine was compiled? I mean day and author. Tnx.
"Work smart - not hard."
sarona

Top contribute Forum
Chief moderators
Points: 20 167,00 
Forum Contributions
Posts: 227
Joined: 14/08/2022, 20:09
Status: Offline (Active 4 Hours, 6 Minutes ago)
Medals: 1
Topics: 8
Reputation: 1784
Has thanked: 360 times
Been thanked: 1462 times

Stockfish x2 NNUE for Android

Post by sarona »

I am fairly certain Linrock is the author of that commit (December 4, 2023).
https://github.com/linrock/Stockfish/commit/64237596a0eaf37630a71b3e4a636a9a42be5058

https://github.com/linrock/Stockfish/tree/64237596a0eaf37630a71b3e4a636a9a42be5058

He has multiple Dual NNUE branches on his Github.
https://github.com/linrock/Stockfish

Edit: examined the source. The only significant change was to use a dimension 2560 net (nn-ac1dbea57aa3.nnue 23-09-13) in place of Linrock's 256 net (nn-ecb35f70ff2a.nnue 23-07-18). The constexpr was modified in nnue_architecture.h for the change.
AlexanderSanteramo

Top contribute Forum
Forum Contributions
Points: 3 834,00 
Posts: 75
Joined: 16/04/2023, 13:36
Status: Offline (Active 5 Days, 16 Hours, 30 Minutes ago)
Medals: 1
Topics: 18
Reputation: 108
Has thanked: 46 times
Been thanked: 201 times

Stockfish x2 NNUE for Android

Post by AlexanderSanteramo »

Post Reply

Return to “Mobile Chess Software, Engines”