#monero-pow

00:16

m-relay

<elongated:matrix.org> Are we fighting risc-v miners ? Or bitmain can update code and still use it ?
00:25

kico

sech1, wouldn't it be possible to take a look at the nounces and try and figure for how long these have been mining? x9 I mean
06:58

sech1

It looks like they use the same firmware, so nonce patterns didn't change
09:49

DataHoarder

You can measure the increase of nonce patterns over time
10:52

DataHoarder

irc.gammaspectra.live/00bff44cf801ed35/out.png
10:52

DataHoarder

remade nonce pattern, from randomx fork or so to last block today
11:14

DataHoarder

zoom into their patterns irc.gammaspectra.live/73e2bfbf1b91b112/out.png
11:15

DataHoarder

nonce % 2^28, remove groups nonce / 2^28 that are 0 or > 10 (0 has a lot of contamination, and higher ones don't appear in nonces)
11:16

DataHoarder

then their pattern is on the bottom 1/16th of this. That is the range of the plot
11:16

DataHoarder

that has their sub-patterns
15:36

hyc

so they've improved effciency 1.6x. About what we expected.
15:38

hyc

still a shame that other commodity risc-v boards aren't very good
15:52

sech1

the best CPU rig that I'm aware of is 7945hx 19.2kh 85w @wall
15:52

sech1

225 h/J
15:52

sech1

X9 is 400 h/J
15:52

sech1

not even 2x better
15:52

sech1

Still, we need RandomX v2
15:52

hyc

yeah. 1.77x better
15:53

sech1

I wonder how many RAM sticks they put into X9
15:53

sech1

must be at least 60
15:53

hyc

with the price of RAM going thru the roof again, Bitmain would make more money cannibalizing the existing X9s for their RAM
15:53

hyc

it's silly...
15:54

sech1

although, that 7945HX can get 20 kh/s with more power, and it runs on a single stick of DDR5 (tuned timings)
15:54

hyc

DDR5 now is 4x its price in September...
15:55

hyc

ts2.tech/en/ram-prices-are-explodin…m-crisis-and-how-long-it-could-last
15:55

sech1

btw I'm done with RISC-V code for XMRig, the next step is to bring to upstream repo and then finally implement the v2 part for RISC-V. Then only the small things will be left
15:55

sech1

I even added hardware AES support for RISC-V
15:55

sech1

Without actual hardware I can test on :D
15:55

sech1

yeah, RAM prices are insane
15:55

hyc

lol. maybe bitmain has already tested it :P
15:56

sech1

maybe :D
15:56

hyc

I think right now the RAM is worth more than it could ever make from mining
15:57

sech1

I think that RandomX program size must be bumped a lot for v2
15:57

sech1

like from 256 to 320 instructions (+25%)
15:57

sech1

because Zen4/Zen5 wait a lot for data from RAM
15:57

sech1

they became much faster than Zen2
15:57

hyc

ah, make more use of instruction cache?
15:57

sech1

more use of computing capacity
15:58

hyc

sounds good
15:58

sech1

they have better IPC and better clocks than 3700X which was the king when RandomX released
15:58

sech1

Instruction frequencies will need to be adjusted to avoid getting FP registers into +- infinity territory
15:58

sech1

because that will hurt entropy
15:59

sech1

but yeah, Zen5 can do 320 instructions instead of 256, almost at the same hashrate (and with CFROUND fix)
16:02

hyc

25% better ipc huh
16:02

hyc

I wonder how that affects arm64, Apple M2
16:10

DataHoarder

L2 caches per thread have also grown quite a bit, while L3 have stayed ... the same
16:10

DataHoarder

without X3D ofc
16:20

sech1

FSQRT instruction is the best to keep FP registers away from overflow/underflow. It basically halves the exponent
16:39

kico

I'm sure bitmain bought RAM for these inb4 the crazyness
16:39

kico

they usually "test" their HW for 1 year
16:40

kico

this miner has probably been in the making for a few years now
16:44

sech1

It's probably been in the making ever since they started selling (=dumping) X5
16:44

sech1

Oh hi tevador
16:44

kico

exactly :P
16:45

sech1

Which means they already have X11 or something in the works
16:45

kico

hehehe
16:46

kico

x5, x9 ... x13?
16:46

tevador

new "ASIC"?
16:46

sech1

tevador I plan to work on RandomX v2 in January and prepare the complete pull request when it's done
16:46

tevador

cool
16:46

sech1

btw I added RISC-V vector JIT + dataset init + vector AES + hardware AES code to XMRig
16:46

sech1

All that code will be added to upstream too
16:47

DataHoarder

if they are mining with that it's not with the same nonce pattern afaik
16:47

sech1

And for v2, I want to increase program size, like a lot (+25%)
16:47

sech1

256 -> 320
16:47

sech1

and increase FSQRT frequency to keep FP registers in range
16:48

DataHoarder

the density of the nonce pattern has decreased over time, though I now need to calculate the actual hashrate of the bands (weighted by difficulty)
16:48

sech1

btw at this point, they can just take stock XMRig (dev branch) and use it on X9 :D
16:48

sech1

so nonce pattern will be the regular one
16:49

DataHoarder

now, yes. but not say couple of years ago since they released other one
16:49

sech1

yes
16:49

tevador

are there any existing risc-v chips with hardware AES?
16:50

sech1

my Orange Pi RV2 has vector extensions but not AES
16:50

sech1

When I asked, I got this answer: "Bunch of SiFive cores has crypto extensions. X280, X390, P470, P670, P870."
16:51

tevador

there are scalar and vector crypto extensions
16:51

sech1

QEMU supports everything so I was able to verify my code, but it can still break on the real hardware
16:51

sech1

I implemented scalar crypto extensions
16:51

sech1

zknd/zkne
16:52

sech1

I haven't heard about vector AES on RISC-V, and I read all the specs
16:54

tevador

github.com/riscv/riscv-crypto/releases/tag/v1.0.0
16:55

sech1

That one I didn't read
16:56

sech1

It's not mentioned in github.com/riscvarchive/riscv-v-spec/releases/tag/v1.0
16:56

sech1

so it's a newer extension
16:58

sech1

oh well, another version to implement?
16:58

sech1

luckily RandomX AES is not a lot of code
17:01

tevador

According to the latest RVA profile, vector crypto should be preferred: github.com/riscv/riscv-profiles/releases/tag/rva23-rvb23-ratified
17:02

tevador

"The scalar crypto extensions Zkn and Zks that were options in RVA22 are not options in RVA23. The goal is for both hardware and software vendors to move to use vector crypto, as vectors are now mandatory and vector crypto is substantially faster than scalar crypto."
17:02

sech1

oh, they even have vror instruction for vector registers
17:03

sech1

I guess I'll added detection of zvkb and zvkned extensions too, before bringing it upstream
17:04

sech1

*add
17:04

sech1

yeah, I'm not a fan of having two hardware AES implementations for RISC-V
17:04

sech1

I already have vectorized soft AES, so vectorized hard AES only makes more sense
17:05

sech1

"vectors are now mandatory" that's good
17:12

tevador

Btw, I'd also suggest to bump the CBRANCH jump frequency to at least 1/32 (currently 1/256).
17:13

tevador

HashX was broken by GPUs because of insufficient branching.
17:19

sech1

HashX is not RandomX, it doesn't do 2048 loop iterations
17:20

sech1

25/256*2048 = 200 taken branches per program on average
17:24

sech1

and it's just one program at a time which can be compiled for GPUs, if I read the description right
17:24

sech1

Then yes, only branching can save it from GPUs.
17:26

tevador

I forgot why we chose 1/256. Perhaps the misprediction overhead was measurable at 1/128, but it could be retested with current hardware.
17:27

sech1

because of misprediction stalls in the pipeline
17:27

sech1

these branches are essentially random and can't be predicted
17:28

tevador

I doesn't need to hurt with SMT because the other thread can run.
17:28

tevador

It*
17:28

DataHoarder

> To take advantage of speculative designs, the random programs should contain branches. However, if branch prediction fails, the speculatively executed instructions are thrown away, which results in a certain amount of wasted energy with each misprediction. Therefore we should aim to minimize the number of mispredictions.
17:28

sech1

oh yes, and this too
17:28

DataHoarder

> Unfortunately, we haven't found a way how to utilize branch prediction in RandomX. Because RandomX is a consensus protocol, all the rules must be set out in advance, which includes the rules for branches.
17:29

DataHoarder

branch prediction - isn't that specific for the CPU? nowadays the predictors for speculation can remember values of registers at certain branches, and if they follow a pattern
17:29

sech1

so 200 taken branches per program = 200xN wasted instructions executed and rolled back
17:29

tevador

Still doesn't explain why 1/256 was selected rather than 1/128.
17:29

sech1

N = pipeline depth
17:30

sech1

the smallest possible value was chosen
17:30

sech1

because we already have a lot of CBRANCH instructions in the code
17:30

sech1

they needed to be frequent to limit instruction reordering optimizations for simple in-order CPUs
17:31

sech1

The question is, 200 taken branches per program is too little or enough?
17:31

sech1

btw increasing program size will also increase the number of branches
17:31

tevador

Yes, it might be enough just to increase the program size.
17:31

sech1

and frequent branches also limit VLIW CPUs
17:31

DataHoarder

and number of CFROUND on avg :)
17:32

DataHoarder

but also decrease frequency they switch
17:32

sech1

CFROUND was nerfed in another way in v2
17:32

DataHoarder

indeed
17:33

DataHoarder

CBRANCH 1/25 is the second most frequent op after FMUL_R 1/32
17:34

DataHoarder

err, 25/256, 32/256
17:34

sech1

Increasing program size to 320 will require increasing FSQRT_R from 6/256 to 7 or even 8, to keep FP registers in range
17:35

sech1

so some other frequencies will need to be reduced
17:36

sech1

IXOR_R can probably be a donor.
17:36

DataHoarder

15/256
17:36

sech1

it doesn't do much in terms of energy required
17:36

sech1

unlike FSQRT_R
17:37

DataHoarder

XOR is just carryless ADD in GF(2) :)
17:37

sech1

making RandomX burn more energy and in places where AMD/Intel CPUs are best optimized (FPU) is the goal
17:38

sech1

sounds counter-intuitive :D
17:39

DataHoarder

specifically float64
17:39

sech1

because in the end it will make AMD/Intel CPUs more efficient, relative to X9
17:40

DataHoarder

where the ai/accelerator stuff is f32 or less :P
17:40

sech1

Internally in the CPU, sqrt is implemented as a table lookup + a few multiplications, so it burns more energy than even FMUL
17:40

sech1

*a few FMAs
17:45

tevador

Zen5 misprediction penalty is ~15 cycles, so ~24000 cycles per hash are wasted currently. It might be OK.
17:46

sech1

much more is wasted when it's waiting for dataset read
17:46

sech1

it's still keeping most of the CPU powered on in these moments
17:46

sech1

which is why 256 -> 320 increase is crucial
17:47

sech1

if it's powered on, it only makes sense to make it keep executing instructions until dataset read is guaranteed ready on most systems
17:48

tevador

Btw, reducing IXOR_R would have a side effect of reducing the mixing of integer registers.
17:50

sech1

yes, but letting FP registers almost always overflow/underflow will hurt entropy even more. Need to do real tests with v2 and 320 program size to make sure their exponents cover the full range, but rarely reach overflow/underflow
17:50

tevador

It might be better to transfer from FMUL_R, which is the main cause of needing a higher FSQRT_R frequency.
17:50

sech1

then it will be obviouse which sqrt frequency is the best
17:50

sech1

we don't need a lot of square roots, because they halve the exponent each time
17:50

sech1

so it's logarithmic dependency
17:51

sech1

FMUL_R can be a donor too
17:51

tevador

Probably RANDOMX_FREQ_FMUL_R 32 -> 30 and RANDOMX_FREQ_FSQRT_R 6 -> 8
17:52

sech1

too high frequency will reduce exponent range, so we will need tests
17:52

sech1

maybe 6 will still be enough, because the amount of square roots will also increase by 25%
17:56

tevador

You will need to rerun this: github.com/tevador/RandomX/blob/mas…#251-floating-point-register-groups
17:57

tevador

However, I can't find the source code for the test
18:10

sech1

not a problem, I will just modify the interpreter to collect the statistics
19:49

sech1

hyc MO discord has a sensible idea: if X9 has to pack this much RAM inside, maybe it's soldered RAM this time? It takes much less space, and they don't need to put a 16 GB memory stick per CPU. 2x2 GB memory chips will be enough
19:49

sech1

So double the dataset in v2? :D
19:51

DataHoarder

^ I tried allocating the dataset via WASM on browser and it just worked btw
19:51

sech1

4 GB dataset / 512 MB light mode is okay now, it's not 2019 anymore
19:51

DataHoarder

they lowered from 4 GiB to 2 GiB afaik
19:52

sech1

btw 4 GB dataset was considered for the original RandomX
19:52

DataHoarder

yeah, I remember reading that up
19:54

DataHoarder

or maybe they brought that back up again v8.dev/blog/4gb-wasm-memory
19:57

m-relay

<syntheticbird:monero.social> sech1. Exactly we're in 2025. RAM is more expensive than ever
19:57

m-relay

<syntheticbird:monero.social> WE NEED 10KB DATASET NOW
19:57

m-relay

<syntheticbird:monero.social> I CANNOT SURVIVE WITHOUT IT
19:57

m-relay

<syntheticbird:monero.social> HEEEEEELLLLLLLPPPPPPPPPP
19:59

sech1

Even single DDR4 stick is 8 GB, so it won't change anything in terms of what miners need to buy
20:00

sech1

Raspberry Pi's will lose, but using them for mining is a bad idea anyway. For anything else, they can use light mode
20:31

tevador

Remember that the current monerod code allocates two caches, so it already uses 512 MB with light mode.
20:35

hyc

Any increases in footprint will bump up hardware requirements
20:36

m-relay

<elongated:matrix.org> High time we increase hw requirements
20:36

hyc

it may make a lot of current nodes & miners nonviable
20:37

m-relay

<elongated:matrix.org> Nodes ? Yes, botnets will be affected
20:38

hyc

yes, nodes too. dataset ram will compete with blockchain cache
20:39

sech1

light mode will require 1 GB then, so 2 GB minimum for running monerod
20:41

m-relay

<syntheticbird:monero.social> Are we sure we wanna piss off one of our significant portion of the hashrate while operations like qubit showcased the fragility of your current miner landscape
20:41

m-relay

<syntheticbird:monero.social> our current*
20:41

m-relay

<syntheticbird:monero.social> Yes, i believe botnets are a significant portion of the hashrate
20:42

m-relay

<syntheticbird:monero.social> you may now proceed to shame me
20:54

sech1

I'm not sure about dataset increase just to brick the X9. Because it's not guaranteed - maybe they have 8 GB per CPU, so it won't stop them

2 months ago

« a day earlier

a day later »

today »