#monero-pow

17:16

hyc

just got a linkedin request from a sales manager at Bitmain.
17:16

hyc

anyone else?
17:16

hyc

from Etsuka Tomonaga
18:20

sech1

hyc nope
18:21

sech1

On RandomX v2 topic: increasing program size does increase the amount of +inf FP values, but the main culprit is not FMUL, but FDIV_M instruction
18:22

sech1

reducing FDIV_M from 4 to 3 and increasing FSQRT_R from 6 to 7 brings the amount of +inf values to only 1.5x higher than v1 levels (it was 6.5x higher without the fix)
18:22

sech1

I'm testing with program size 384
18:23

sech1

With these parameters, v2 program still has ~4.5 FDIV_M instructions per program, which is more than in v1
18:24

sech1

So program size = 384, RANDOMX_FREQ_FDIV_M = 3, RANDOMX_FREQ_FSQRT_R = 7 are the tentative values for RandomX v2
18:32

sech1

"About 2% of programs produce at least one infinity value."
18:32

sech1

For v2 with the above parameters, it's 2.8%
18:36

sech1

Also, changing RANDOMX_FREQ_FDIV_M from 4 to 3 and RANDOMX_FREQ_FSQRT_R from 6 to 7 is very convenient to implement - need to change just 2 neighboring values in the instruction table
18:37

sech1

Actually, just one value
18:39

sech1

Did one more test without changing any frequencies - got 6.85% of programs with at least one infinity value
18:39

sech1

I think it's acceptable too?
18:40

DataHoarder

Made preliminary changes and pushed test vectors git.gammaspectra.live/P2Pool/go-ran…3533a90e89be97344f93a8e8359bcb957e0
18:41

sech1

For v1, 85% of all hashes never have any +inf value during execution
18:42

sech1

For v2 without instruction frequency changes, it's 56.7%
18:42

sech1

I think it's better to have +inf values more often
18:42

sech1

So ASIC must implement their support, or have almost every second hash invalid
18:43

DataHoarder

in the semifloat code I saw, the inf path was done slow as it was very unlikely to be hit, indeed
18:43

sech1

Even without frequency changes, 0.12% of individual group E values are +inf after a main loop iteration
18:44

sech1

(1-0.0012)^8=0.99
18:45

sech1

so 99% of program iterations don't have +inf in group E registers
18:45

sech1

I think it's fine
18:45

moneromooo

Unsure whether it's been pointed out, but if an hypothetical asic does not implement infinities, it can early out at the first infinity it encounters, so N% of programs yielding an infinity anywhere means less than N% hash rate loss.
18:45

sech1

It won't hurt scratchpad entropy, because it does AES now anyway
18:46

sech1

moneromooo true
18:46

sech1

but I think RandomX v1 doesn't have enough +inf values
18:47

sech1

v2 has a bit more, but not too much - so it's good
18:47

sech1

I don't think we need to change instruction frequencies at all
18:48

DataHoarder

reaching inf also allows short path operations afterward, though
18:49

DataHoarder

inf sticks
18:49

sech1

"99% of program iterations don't have +inf in group E registers"
18:49

sech1

I can't give more than 1% speedup
18:49

sech1

I need to add some more counters to check this number
18:50

DataHoarder

yeah, I'll add some metrics to mine, lemme see
18:50

sech1

also, when there is an infinity, it's usually just one of 4 group e registers
18:54

sech1

paste.debian.net/hidden/e6aa1a28
18:54

sech1

This is randomx-benchmark binary, running in interpreter mode with added counters
18:55

sech1

So 0.77%, and when it does happen, it's almost always just one group E register
18:55

sech1

so the theoretical speedup of "sticky +inf" optimization will be something like 0.2%
18:57

sech1

DataHoarder you can revert instruction frequencies to the old values :)
18:57

DataHoarder

yeah will do :)
18:57

DataHoarder

it's on the bench/testing branch
19:02

DataHoarder

you are only checking after the intepreter exits the loops right?
19:02

DataHoarder

lemme check all total operations
19:04

sech1

Yes, after loop exit
19:04

sech1

Because as you said, infinity sticks
19:12

sech1

Of the tasks in tevador/RandomX #274 , almost all is done. Only the "New PowerPC intrinsics" and "Update documentation" left
19:13

sech1

I mean, done in github.com/SChernykh/RandomX/tree/v2 branch
19:13

sech1

I don't have any big endian PPC for testing though
19:13

sech1

I can of course copy over the fallback intrinsic code there and call it a day :)
19:14

sech1

because the fallback code passes the tests on s390x which is big endian
20:12

sech1

Updated the documentation. That's basically it, we can start testing v2 on different systems.
22:12

DataHoarder

What kind of targeted testing is being looked for V2?
23:17

sech1

V1 vs V2 hashrate and power at the wall on different mining rigs. I'll dig up my old PCs (3700X and 5600X) from the basement tomorrow
23:18

sech1

Also, v2 hashrate vs program size graph. Since it's not in xmrig yet, randomx-benchmark with 100K nonces will do (but only with large pages + MSR enabled)
23:32

DataHoarder

I guess we can also use the brand specific cpu monitoring

4 months ago

« 5 days earlier

2 days later »

today »