-
tevador
-
tevador
I'm getting about +5% with Zen3
-
tevador
Interestingly, I can squeeze up to 64 AES rounds at the end of the loop before hashrate starts decreasing. The current draft only has 16 rounds. TBD if the impact on soft AES systems is acceptable.
-
sech1
64 rounds when running single threads, or all threads on all cores?
-
tevador
1 thread
-
sech1
Need to test 2 threads running on the same core - this will slow down AES rounds, but L3 latency will stay the same
-
tevador
I don't have enough L3 to test with all threads.
-
sech1
I don't think you can squeeze more than 32 rounds
-
tevador
I think 16 rounds are find, this already doubles the amount of AES per hash.
-
tevador
fine*
-
sech1
yes
-
sech1
and it is in line with how we already use AES (4 rounds for program buffer, for example)
-
sech1
you can set thread affinity and run 2 threads on the same core to test how many rounds is possible
-
tevador
the code is available, anyone can try it
-
sech1
16 -> 32 rounds, -0.7% hashrate (2 threads on one core)
-
sech1
32 -> 64 rounds, -3.25% (2 threads on one core)
-
sech1
so even 32 rounds is noticeable
-
tevador
and xor -> 16 rounds?
-
sech1
less than 1 h/s difference
-
sech1
so < 0.1%
-
sech1
command line was "--mine --jit --largePages --threads 2 --affinity 3 --init 16"
-
tevador
so 16 rounds seems to be a good choice
-
sech1
yes
-
sech1
0.7% drop with 32 rounds is a lot. I consider 0.1% speedup to be significant these days, when I optimize XMRig
-
sech1
but I'm testing on Ryzen 7 1700 (Zen 1)
-
sech1
this CPU has the fastest L3
-
sech1
Zen 3 / Zen 4 will have smaller drop on 32 rounds
-
sech1
still, 16 rounds should be optimal
-
sech1
because it will be unnoticeable on all CPUs
-
sech1
*with hardware AES
-
tevador
btw, to measure the CFROUND effect, you can change line 125 in common.hpp to using JitCompiler = JitCompilerX86<RANDOMX_FLAG_DEFAULT>; and rebuild
-
tevador
I measured about 5% with 1 thread on Ryzen 5850U
-
sech1
926.4 h/s -> 1004.2 h/s on Ryzen 7 1700 (2 threads on 1 core)
-
sech1
so 8% speedup
-
sech1
and it was 1004.2 -> 1003.8 h/s when I changed xor to 16 aes rounds
-
tevador
pretty significant
-
sech1
yes, all Zen CPUs implement mxcsr instructions in microcode
-
sech1
Intel is much faster with it
-
sech1
but this change shouldn't make Intel CPUs worse, it will just give smaller boost (0-1%)