09:50:57 preliminary results for software AES on Orange Pi RV2: 75.28 ns/iteration (scalar code) vs 48.82 ns/iteration (vector code) 09:51:04 1.5x speedup, less than expected but still good 09:58:04 I guess it gets bottlenecked by the random table lookups 10:23:08 correction: the number above is for 2 AES rounds per iteration, so 1 AES round takes half of this time 10:24:34 CPU speed is 1.6 GHz, so it's 60 clock cycles per round for scalar and 39 cycles per round for vector code 17:52:45 lol, AES got 2x faster in XMRig, but the other parts got slower, the end result is almost negligible :D 17:52:54 before: https://p2pool.io/u/8051b40727f9db94/Screenshot%20from%202025-12-05%2018-50-14.png 17:53:04 after: https://p2pool.io/u/bd7af294470966d2/Screenshot%20from%202025-12-05%2018-50-31.png 17:53:21 bottlenecked by memory (this CPU doesn't have enough cache for the scratchpad) 17:53:58 still ~1% faster with vectorized soft aes 17:54:45 I need to test hashrate with 512 KB scratchpad and a single thread, to see the pure performance 19:33:44 +4% in the end: before https://p2pool.io/u/1dcabcc9fbc17356/Screenshot%20from%202025-12-05%2020-23-36.png and after https://p2pool.io/u/130b5e956a020850/Screenshot%20from%202025-12-05%2020-30-57.png 19:34:02 soft aes itself is 2.2x faster 20:14:34 https://github.com/xmrig/xmrig/pull/3740