00:16:11 Are we fighting risc-v miners ? Or bitmain can update code and still use it ? 00:25:16 sech1, wouldn't it be possible to take a look at the nounces and try and figure for how long these have been mining? x9 I mean 06:58:46 It looks like they use the same firmware, so nonce patterns didn't change 09:49:54 You can measure the increase of nonce patterns over time 10:52:25 https://irc.gammaspectra.live/00bff44cf801ed35/out.png 10:52:46 remade nonce pattern, from randomx fork or so to last block today 11:14:13 zoom into their patterns https://irc.gammaspectra.live/73e2bfbf1b91b112/out.png 11:15:44 nonce % 2^28, remove groups nonce / 2^28 that are 0 or > 10 (0 has a lot of contamination, and higher ones don't appear in nonces) 11:16:12 then their pattern is on the bottom 1/16th of this. That is the range of the plot 11:16:39 that has their sub-patterns 15:36:35 so they've improved effciency 1.6x. About what we expected. 15:38:12 still a shame that other commodity risc-v boards aren't very good 15:52:12 the best CPU rig that I'm aware of is 7945hx 19.2kh 85w @wall 15:52:15 225 h/J 15:52:18 X9 is 400 h/J 15:52:27 not even 2x better 15:52:34 Still, we need RandomX v2 15:52:42 yeah. 1.77x better 15:53:31 I wonder how many RAM sticks they put into X9 15:53:35 must be at least 60 15:53:41 with the price of RAM going thru the roof again, Bitmain would make more money cannibalizing the existing X9s for their RAM 15:53:56 it's silly... 15:54:01 although, that 7945HX can get 20 kh/s with more power, and it runs on a single stick of DDR5 (tuned timings) 15:54:59 DDR5 now is 4x its price in September... 15:55:25 https://ts2.tech/en/ram-prices-are-exploding-in-december-2025-whats-driving-the-dram-crisis-and-how-long-it-could-last/ 15:55:29 btw I'm done with RISC-V code for XMRig, the next step is to bring to upstream repo and then finally implement the v2 part for RISC-V. Then only the small things will be left 15:55:37 I even added hardware AES support for RISC-V 15:55:44 Without actual hardware I can test on :D 15:55:54 yeah, RAM prices are insane 15:55:54 lol. maybe bitmain has already tested it :P 15:56:25 maybe :D 15:56:59 I think right now the RAM is worth more than it could ever make from mining 15:57:01 I think that RandomX program size must be bumped a lot for v2 15:57:08 like from 256 to 320 instructions (+25%) 15:57:24 because Zen4/Zen5 wait a lot for data from RAM 15:57:36 they became much faster than Zen2 15:57:43 ah, make more use of instruction cache? 15:57:56 more use of computing capacity 15:58:06 sounds good 15:58:09 they have better IPC and better clocks than 3700X which was the king when RandomX released 15:58:28 Instruction frequencies will need to be adjusted to avoid getting FP registers into +- infinity territory 15:58:34 because that will hurt entropy 15:59:10 but yeah, Zen5 can do 320 instructions instead of 256, almost at the same hashrate (and with CFROUND fix) 16:02:14 25% better ipc huh 16:02:56 I wonder how that affects arm64, Apple M2 16:10:20 L2 caches per thread have also grown quite a bit, while L3 have stayed ... the same 16:10:29 without X3D ofc 16:20:54 FSQRT instruction is the best to keep FP registers away from overflow/underflow. It basically halves the exponent 16:39:41 I'm sure bitmain bought RAM for these inb4 the crazyness 16:39:52 they usually "test" their HW for 1 year 16:40:19 this miner has probably been in the making for a few years now 16:44:47 It's probably been in the making ever since they started selling (=dumping) X5 16:44:53 Oh hi tevador 16:44:59 exactly :P 16:45:15 Which means they already have X11 or something in the works 16:45:24 hehehe 16:46:01 x5, x9 ... x13? 16:46:02 new "ASIC"? 16:46:12 tevador I plan to work on RandomX v2 in January and prepare the complete pull request when it's done 16:46:22 cool 16:46:40 btw I added RISC-V vector JIT + dataset init + vector AES + hardware AES code to XMRig 16:46:46 All that code will be added to upstream too 16:47:30 if they are mining with that it's not with the same nonce pattern afaik 16:47:31 And for v2, I want to increase program size, like a lot (+25%) 16:47:35 256 -> 320 16:47:44 and increase FSQRT frequency to keep FP registers in range 16:48:04 the density of the nonce pattern has decreased over time, though I now need to calculate the actual hashrate of the bands (weighted by difficulty) 16:48:25 btw at this point, they can just take stock XMRig (dev branch) and use it on X9 :D 16:48:36 so nonce pattern will be the regular one 16:49:09 now, yes. but not say couple of years ago since they released other one 16:49:16 yes 16:49:52 are there any existing risc-v chips with hardware AES? 16:50:16 my Orange Pi RV2 has vector extensions but not AES 16:50:50 When I asked, I got this answer: "Bunch of SiFive cores has crypto extensions. X280, X390, P470, P670, P870." 16:51:14 there are scalar and vector crypto extensions 16:51:20 QEMU supports everything so I was able to verify my code, but it can still break on the real hardware 16:51:35 I implemented scalar crypto extensions 16:51:37 zknd/zkne 16:52:36 I haven't heard about vector AES on RISC-V, and I read all the specs 16:54:23 https://github.com/riscv/riscv-crypto/releases/tag/v1.0.0 16:55:29 That one I didn't read 16:56:45 It's not mentioned in https://github.com/riscvarchive/riscv-v-spec/releases/tag/v1.0 16:56:54 so it's a newer extension 16:58:14 oh well, another version to implement? 16:58:44 luckily RandomX AES is not a lot of code 17:01:45 According to the latest RVA profile, vector crypto should be preferred: https://github.com/riscv/riscv-profiles/releases/tag/rva23-rvb23-ratified 17:02:12 "The scalar crypto extensions Zkn and Zks that were options in RVA22 are not options in RVA23. The goal is for both hardware and software vendors to move to use vector crypto, as vectors are now mandatory and vector crypto is substantially faster than scalar crypto." 17:02:53 oh, they even have vror instruction for vector registers 17:03:57 I guess I'll added detection of zvkb and zvkned extensions too, before bringing it upstream 17:04:00 *add 17:04:38 yeah, I'm not a fan of having two hardware AES implementations for RISC-V 17:04:52 I already have vectorized soft AES, so vectorized hard AES only makes more sense 17:05:39 "vectors are now mandatory" that's good 17:12:03 Btw, I'd also suggest to bump the CBRANCH jump frequency to at least 1/32 (currently 1/256). 17:13:30 HashX was broken by GPUs because of insufficient branching. 17:19:59 HashX is not RandomX, it doesn't do 2048 loop iterations 17:20:17 25/256*2048 = 200 taken branches per program on average 17:24:01 and it's just one program at a time which can be compiled for GPUs, if I read the description right 17:24:26 Then yes, only branching can save it from GPUs. 17:26:15 I forgot why we chose 1/256. Perhaps the misprediction overhead was measurable at 1/128, but it could be retested with current hardware. 17:27:33 because of misprediction stalls in the pipeline 17:27:43 these branches are essentially random and can't be predicted 17:28:12 I doesn't need to hurt with SMT because the other thread can run. 17:28:15 It* 17:28:17 > To take advantage of speculative designs, the random programs should contain branches. However, if branch prediction fails, the speculatively executed instructions are thrown away, which results in a certain amount of wasted energy with each misprediction. Therefore we should aim to minimize the number of mispredictions. 17:28:31 oh yes, and this too 17:28:57 > Unfortunately, we haven't found a way how to utilize branch prediction in RandomX. Because RandomX is a consensus protocol, all the rules must be set out in advance, which includes the rules for branches. 17:29:39 branch prediction - isn't that specific for the CPU? nowadays the predictors for speculation can remember values of registers at certain branches, and if they follow a pattern 17:29:40 so 200 taken branches per program = 200xN wasted instructions executed and rolled back 17:29:41 Still doesn't explain why 1/256 was selected rather than 1/128. 17:29:46 N = pipeline depth 17:30:11 the smallest possible value was chosen 17:30:23 because we already have a lot of CBRANCH instructions in the code 17:30:49 they needed to be frequent to limit instruction reordering optimizations for simple in-order CPUs 17:31:05 The question is, 200 taken branches per program is too little or enough? 17:31:26 btw increasing program size will also increase the number of branches 17:31:56 Yes, it might be enough just to increase the program size. 17:31:57 and frequent branches also limit VLIW CPUs 17:31:58 and number of CFROUND on avg :) 17:32:07 but also decrease frequency they switch 17:32:14 CFROUND was nerfed in another way in v2 17:32:28 indeed 17:33:49 CBRANCH 1/25 is the second most frequent op after FMUL_R 1/32 17:34:00 err, 25/256, 32/256 17:34:54 Increasing program size to 320 will require increasing FSQRT_R from 6/256 to 7 or even 8, to keep FP registers in range 17:35:01 so some other frequencies will need to be reduced 17:36:15 IXOR_R can probably be a donor. 17:36:31 15/256 17:36:49 it doesn't do much in terms of energy required 17:36:55 unlike FSQRT_R 17:37:12 XOR is just carryless ADD in GF(2) :) 17:37:15 making RandomX burn more energy and in places where AMD/Intel CPUs are best optimized (FPU) is the goal 17:38:55 sounds counter-intuitive :D 17:39:08 specifically float64 17:39:10 because in the end it will make AMD/Intel CPUs more efficient, relative to X9 17:40:03 where the ai/accelerator stuff is f32 or less :P 17:40:04 Internally in the CPU, sqrt is implemented as a table lookup + a few multiplications, so it burns more energy than even FMUL 17:40:20 *a few FMAs 17:45:03 Zen5 misprediction penalty is ~15 cycles, so ~24000 cycles per hash are wasted currently. It might be OK. 17:46:04 much more is wasted when it's waiting for dataset read 17:46:19 it's still keeping most of the CPU powered on in these moments 17:46:25 which is why 256 -> 320 increase is crucial 17:47:08 if it's powered on, it only makes sense to make it keep executing instructions until dataset read is guaranteed ready on most systems 17:48:26 Btw, reducing IXOR_R would have a side effect of reducing the mixing of integer registers. 17:50:02 yes, but letting FP registers almost always overflow/underflow will hurt entropy even more. Need to do real tests with v2 and 320 program size to make sure their exponents cover the full range, but rarely reach overflow/underflow 17:50:13 It might be better to transfer from FMUL_R, which is the main cause of needing a higher FSQRT_R frequency. 17:50:19 then it will be obviouse which sqrt frequency is the best 17:50:41 we don't need a lot of square roots, because they halve the exponent each time 17:50:50 so it's logarithmic dependency 17:51:14 FMUL_R can be a donor too 17:51:34 Probably RANDOMX_FREQ_FMUL_R 32 -> 30 and RANDOMX_FREQ_FSQRT_R 6 -> 8 17:52:01 too high frequency will reduce exponent range, so we will need tests 17:52:20 maybe 6 will still be enough, because the amount of square roots will also increase by 25% 17:56:43 You will need to rerun this: https://github.com/tevador/RandomX/blob/master/doc/design.md#251-floating-point-register-groups 17:57:06 However, I can't find the source code for the test 18:10:22 not a problem, I will just modify the interpreter to collect the statistics 19:49:12 hyc MO discord has a sensible idea: if X9 has to pack this much RAM inside, maybe it's soldered RAM this time? It takes much less space, and they don't need to put a 16 GB memory stick per CPU. 2x2 GB memory chips will be enough 19:49:16 So double the dataset in v2? :D 19:51:14 ^ I tried allocating the dataset via WASM on browser and it just worked btw 19:51:45 4 GB dataset / 512 MB light mode is okay now, it's not 2019 anymore 19:51:47 they lowered from 4 GiB to 2 GiB afaik 19:52:01 btw 4 GB dataset was considered for the original RandomX 19:52:15 yeah, I remember reading that up 19:54:48 or maybe they brought that back up again https://v8.dev/blog/4gb-wasm-memory 19:57:14 sech1. Exactly we're in 2025. RAM is more expensive than ever 19:57:27 WE NEED 10KB DATASET NOW 19:57:40 I CANNOT SURVIVE WITHOUT IT 19:57:43 HEEEEEELLLLLLLPPPPPPPPPP 19:59:20 Even single DDR4 stick is 8 GB, so it won't change anything in terms of what miners need to buy 20:00:26 Raspberry Pi's will lose, but using them for mining is a bad idea anyway. For anything else, they can use light mode 20:31:24 Remember that the current monerod code allocates two caches, so it already uses 512 MB with light mode. 20:35:47 Any increases in footprint will bump up hardware requirements 20:36:37 High time we increase hw requirements 20:36:43 it may make a lot of current nodes & miners nonviable 20:37:17 Nodes ? Yes, botnets will be affected 20:38:05 yes, nodes too. dataset ram will compete with blockchain cache 20:39:55 light mode will require 1 GB then, so 2 GB minimum for running monerod 20:41:13 Are we sure we wanna piss off one of our significant portion of the hashrate while operations like qubit showcased the fragility of your current miner landscape 20:41:30 our current* 20:41:50 Yes, i believe botnets are a significant portion of the hashrate 20:42:00 you may now proceed to shame me 20:54:49 I'm not sure about dataset increase just to brick the X9. Because it's not guaranteed - maybe they have 8 GB per CPU, so it won't stop them