02:41:57 I don't think it's appropriate to say X9 was designed specifically for RandomX if it does end up being SG2044, the Sophon CPUs end up in many products 02:42:49 that HPC benchmark comparison linked a few days ago was enlightening as to why the X5 didn't hit back-of-envelope potential hashrates, the memory controller is significantly less efficient than available on mainstream x86 chips 02:43:13 and with RandomX being very sensitive to memory performance, makes sense 02:43:26 we won't find out what X9 is actually running for another several months :( 02:43:59 if change happens to be primarily memory controller improvements, that is a surprising performance leap 02:44:10 less stalled cycles on the table 02:45:28 anyway, bitmain could probably make a CPU that natively implements the RandomX ISA but that would be far riskier than the RISC-V route (no pun intended :^)) and completely ignores any alternative markets 02:46:04 sg2042 and sg2044 are rather appealing as general purpose CPUs, I would not mind having a server with one or two of them but the retail price is too much 02:47:36 chances of X5 or X9 having a high bandwidth interconnect from the CPUs to the controller is low, otherwise I would outright purchase one and work to achieve arbitrary code execution 02:48:23 1152 cores with only SPI/I2C or maybe a UART for outside connectivity is such a shame 02:58:40 @elongated; bitmain/sophon CPUs are still commodity/general purpose even though X5/X9 makes use of them. they're just in an unusual form factor compared to a typical server. there's no point in making RandomX less efficient for RISC-V as a whole 03:02:58 frankly I doubt bitmain/sophon re-designed sg2044 specifically for RandomX as primary use case, sg2042 core is from T-head and not developed in house and it's doubtful Bitmain went full custom for sg2044. yes, it's probable they chose certain aspects of SoC configuration for X9 use case, but those changes improve the CPU's performance as a whole, not just mining-specific 03:04:44 ^ they could have selected to disable specific features not used by randomx, for example, branch predictor units as the branching is random, and improve efficiency there 05:43:46 wish they'd just make it a general purpose computer and sell it straight. 05:44:35 that does not much for their marketing 05:44:59 then they'd also have to support users :') 05:45:06 and they aren't even updating existing nonce offsets lol 06:42:08 even assuming a 30% performance hit, those x9 miners are still be more profitable than Ryzens 06:42:13 currently, they offer about 400 H/J; a 30% hit is still 260 H/J whereas Ryzens are 150-175 H/J at max 06:42:32 assuming current profit, it will pay itself off in about 1.5 years 07:10:16 7945HX set a record of 225 H/J. This system: https://www.minisforum.com/products/minisforum-bd795m 07:10:33 19.2 KH/s and 85 W at the wall 07:11:13 I am in contact with the owner of this system and he is willing to test all tweaks 07:12:10 SG2044 has 32 memory channels which is impressive. The question is how many memory sticks (or soldered RAM) per CPU Bitmain was able to squeeze into that case. 07:12:37 gingeropolous it is, just very expensive 07:12:58 In any case, if SG2044 is so good with memory, it means it's compute limited. So any increase in compute in v2 -> direct hashrate hit for X9 07:14:51 yeah, modern cpu memory latencies have increased or not improved while the rest did 07:15:08 L3 stays the same-ish as couple of years ago for baseline 07:15:23 but L1/L2 have grown a lot, mostly depending on the hot paths and data to stay in chip instead of RAM 07:16:13 the cheap EPYC you can find from clouds (not vanilla models) also tend to have worse L2 latency for example https://chipsandcheese.com/p/amds-epyc-7j13-zen-3-customized 07:17:09 even mobile CPUs have had ever-increasing caches, my laptop has 80K/96K L1 (p/e), 2M/512K L2 (p/e), 36M L3 07:17:16 it's ridiculous 07:18:08 note L1/L2 is usually given on per-core values 07:18:14 while L3 is given for the whole cpu 07:18:24 so you want to divide that per thread as well :) 07:18:25 ^^ 07:18:42 cpus are also increasing cores, so L3 increases proportionally 07:18:52 SMT on P-cores, no SMT on E-cores there, but not much worth using SMT for RandomX on this CPU 07:18:53 most zen stuff has 2MB/thread L3 07:19:16 https://astr.al/u/db04e7db.txt 07:19:25 except zen 5c (or 4c?) which is 1MB/thread L3 07:19:28 with unaffected L2/L1 07:19:42 unfortunately heavily affected by throttle on this machine, without throttle it would be fairly decent at RandomX for an intel CPU 07:20:09 the 2 MB/thread has more or less not improved (besides X3D) across vendors tbh 07:21:07 meanwhile Zen5 has 1MB L2 per core 07:21:10 :D 07:22:10 SG2044 has 2 MB L2 per cluster of 4 cores 07:22:38 which (and 64 MB L3 available to the whole CPU) makes it very fast in single thread benchmarks 07:25:00 10 years ago I used to buy xeon for home machines just to get 25-50MB L3, now it's available on consumer grade 07:25:40 ^ and some top of the line server cpus are reducing L3 in favor of higher core density (while leaving L1/L2 more or less the same) 07:25:58 or increasing L2 even, some as low as 1.375 L3/core 07:26:22 at least on intel 07:26:42 look https://www.techpowerup.com/cpu-specs/epyc-9965.c3904 07:26:44 seems they run out of tricks to stay competitive and are resorted to super-sizing caches to keep up 07:27:08 192 cores, 1 MB/core L2, 2MB/core L3 (1MB/thread) 07:27:37 225K/s RandomX for that chip 07:27:48 well, a lot of what these cpus run are wide operations that don't deal well on GPUs still 07:28:07 each core is independent and doesn't need to share across L3 as much 07:28:12 yeap 07:28:39 this is the non zen 5c top https://www.techpowerup.com/cpu-specs/epyc-9755.c3881 07:28:58 with full NUMA enabled and VMs pinned to nodes without overlap, can get some extremely high virtualisation density on zen5 epycs 07:29:09 warning eureka_ 07:29:11 that benchmark is for 2 CPUs 07:29:22 9965 07:29:22 meanwhile I'm still here with sad 7402p outperformed by my laptop (: 07:29:25 aye 07:29:41 384 thread, but no SMT in that benchmark I assume 07:29:51 benchmark with 576 thread is only 195K 07:29:59 ok the 9755 benchmarks are borked 07:30:40 a lot of xmrig benchmark results are not ideal, wish we had users who knew how to tune better 07:31:13 not a lot of overclockers submitting benchmarks either, would be fun to see how high single thread can get 07:31:25 better find the engineering sample :D 07:31:26 RandomX is a great general purpose benchmark these days imo 07:31:30 those tend to be "better" 07:31:53 much better than older synthetic benchmarks running small code that fits in cache and gives inflated picture of core performance 07:32:10 hashrate comparison between CPUs is more realistic for real world comparisons 07:32:22 at least single threaded, with cache taken into account 07:33:39 also stuff like: https://chipsandcheese.com/i/174871357/memory-subsystem-and-numa-characteristics 07:34:51 yeap 07:41:20 anyway, getting sg204{2,4} in hands of chips&cheese would be fun for benchmarks 07:44:48 there's also Xeon Phi :D 07:45:05 new phi :V when 07:45:22 it even had avx 512 07:45:46 give me 192x gracemont with SMT4 on a PCIe card pls 07:45:52 instead of that old slow shit 07:46:23 also 4x SMT 07:46:36 i liked the knight's corner cards, very unique design 07:46:56 P54C core but forcefully expanded to 64 bit datapath, no MMX or x87 or any legacy stuff, boots up right into 64 bit no real mode 07:47:06 sort-of-AVX-512 but not really 07:47:29 not totally x86 compatible.. gcc considered architecture to be k1om instead :D 07:47:53 meanwhile I moved my go-randomx stuff to parameterized VM config https://git.gammaspectra.live/P2Pool/go-randomx/commit/596c6782e95351a7555f640a5f430a87dfca54f8 07:47:53 so can test with differing L1/L2/L3 sizes and program size, and op distribution 07:47:54 i forward ported kernel support for those cards to newer 4.x kernel some years ago, for fun experiments with containers 07:49:23 considered doing silly budget hosting on the cards, 8 VMs with 7 cores/2GB each, SAN storage 07:49:48 but keeping gcc up to date was so much work 07:50:23 and even if you can upstream it it'd require a maintainer 07:50:42 yep.. nobody wants MIC in today, and knight's landing was real x86 so no point 07:50:46 it's just historical curiosity 07:51:07 but for a few short months, best damn cryptonight miner you could get before ASICs :D 13:56:33 hyc https://github.com/tevador/RandomX/pull/316 14:01:18 Damn, it failed CI. Everything worked in QEMU :D 14:04:28 huh, and it works in different configs (both with and without vector and aes)