-
hyc
just got a linkedin request from a sales manager at Bitmain.
-
hyc
anyone else?
-
hyc
from Etsuka Tomonaga
-
sech1
hyc nope
-
sech1
On RandomX v2 topic: increasing program size does increase the amount of +inf FP values, but the main culprit is not FMUL, but FDIV_M instruction
-
sech1
reducing FDIV_M from 4 to 3 and increasing FSQRT_R from 6 to 7 brings the amount of +inf values to only 1.5x higher than v1 levels (it was 6.5x higher without the fix)
-
sech1
I'm testing with program size 384
-
sech1
With these parameters, v2 program still has ~4.5 FDIV_M instructions per program, which is more than in v1
-
sech1
So program size = 384, RANDOMX_FREQ_FDIV_M = 3, RANDOMX_FREQ_FSQRT_R = 7 are the tentative values for RandomX v2
-
sech1
"About 2% of programs produce at least one infinity value."
-
sech1
For v2 with the above parameters, it's 2.8%
-
sech1
Also, changing RANDOMX_FREQ_FDIV_M from 4 to 3 and RANDOMX_FREQ_FSQRT_R from 6 to 7 is very convenient to implement - need to change just 2 neighboring values in the instruction table
-
sech1
Actually, just one value
-
sech1
Did one more test without changing any frequencies - got 6.85% of programs with at least one infinity value
-
sech1
I think it's acceptable too?
-
DataHoarder
-
sech1
For v1, 85% of all hashes never have any +inf value during execution
-
sech1
For v2 without instruction frequency changes, it's 56.7%
-
sech1
I think it's better to have +inf values more often
-
sech1
So ASIC must implement their support, or have almost every second hash invalid
-
DataHoarder
in the semifloat code I saw, the inf path was done slow as it was very unlikely to be hit, indeed
-
sech1
Even without frequency changes, 0.12% of individual group E values are +inf after a main loop iteration
-
sech1
(1-0.0012)^8=0.99
-
sech1
so 99% of program iterations don't have +inf in group E registers
-
sech1
I think it's fine
-
moneromooo
Unsure whether it's been pointed out, but if an hypothetical asic does not implement infinities, it can early out at the first infinity it encounters, so N% of programs yielding an infinity anywhere means less than N% hash rate loss.
-
sech1
It won't hurt scratchpad entropy, because it does AES now anyway
-
sech1
moneromooo true
-
sech1
but I think RandomX v1 doesn't have enough +inf values
-
sech1
v2 has a bit more, but not too much - so it's good
-
sech1
I don't think we need to change instruction frequencies at all
-
DataHoarder
reaching inf also allows short path operations afterward, though
-
DataHoarder
inf sticks
-
sech1
"99% of program iterations don't have +inf in group E registers"
-
sech1
I can't give more than 1% speedup
-
sech1
I need to add some more counters to check this number
-
DataHoarder
yeah, I'll add some metrics to mine, lemme see
-
sech1
also, when there is an infinity, it's usually just one of 4 group e registers
-
sech1
-
sech1
This is randomx-benchmark binary, running in interpreter mode with added counters
-
sech1
So 0.77%, and when it does happen, it's almost always just one group E register
-
sech1
so the theoretical speedup of "sticky +inf" optimization will be something like 0.2%
-
sech1
DataHoarder you can revert instruction frequencies to the old values :)
-
DataHoarder
yeah will do :)
-
DataHoarder
it's on the bench/testing branch
-
DataHoarder
you are only checking after the intepreter exits the loops right?
-
DataHoarder
lemme check all total operations
-
sech1
Yes, after loop exit
-
sech1
Because as you said, infinity sticks
-
sech1
Of the tasks in
tevador/RandomX #274 , almost all is done. Only the "New PowerPC intrinsics" and "Update documentation" left
-
sech1
-
sech1
I don't have any big endian PPC for testing though
-
sech1
I can of course copy over the fallback intrinsic code there and call it a day :)
-
sech1
because the fallback code passes the tests on s390x which is big endian
-
sech1
Updated the documentation. That's basically it, we can start testing v2 on different systems.
-
DataHoarder
What kind of targeted testing is being looked for V2?
-
sech1
V1 vs V2 hashrate and power at the wall on different mining rigs. I'll dig up my old PCs (3700X and 5600X) from the basement tomorrow
-
sech1
Also, v2 hashrate vs program size graph. Since it's not in xmrig yet, randomx-benchmark with 100K nonces will do (but only with large pages + MSR enabled)
-
DataHoarder
I guess we can also use the brand specific cpu monitoring