-
eureka_
I don't think it's appropriate to say X9 was designed specifically for RandomX if it does end up being SG2044, the Sophon CPUs end up in many products
-
eureka_
that HPC benchmark comparison linked a few days ago was enlightening as to why the X5 didn't hit back-of-envelope potential hashrates, the memory controller is significantly less efficient than available on mainstream x86 chips
-
eureka_
and with RandomX being very sensitive to memory performance, makes sense
-
eureka_
we won't find out what X9 is actually running for another several months :(
-
eureka_
if change happens to be primarily memory controller improvements, that is a surprising performance leap
-
eureka_
less stalled cycles on the table
-
eureka_
anyway, bitmain could probably make a CPU that natively implements the RandomX ISA but that would be far riskier than the RISC-V route (no pun intended :^)) and completely ignores any alternative markets
-
eureka_
sg2042 and sg2044 are rather appealing as general purpose CPUs, I would not mind having a server with one or two of them but the retail price is too much
-
eureka_
chances of X5 or X9 having a high bandwidth interconnect from the CPUs to the controller is low, otherwise I would outright purchase one and work to achieve arbitrary code execution
-
eureka_
1152 cores with only SPI/I2C or maybe a UART for outside connectivity is such a shame
-
eureka_
@elongated; bitmain/sophon CPUs are still commodity/general purpose even though X5/X9 makes use of them. they're just in an unusual form factor compared to a typical server. there's no point in making RandomX less efficient for RISC-V as a whole
-
eureka_
frankly I doubt bitmain/sophon re-designed sg2044 specifically for RandomX as primary use case, sg2042 core is from T-head and not developed in house and it's doubtful Bitmain went full custom for sg2044. yes, it's probable they chose certain aspects of SoC configuration for X9 use case, but those changes improve the CPU's performance as a whole, not just mining-specific
-
DataHoarder
^ they could have selected to disable specific features not used by randomx, for example, branch predictor units as the branching is random, and improve efficiency there
-
gingeropolous
wish they'd just make it a general purpose computer and sell it straight.
-
DataHoarder
that does not much for their marketing
-
DataHoarder
then they'd also have to support users :')
-
DataHoarder
and they aren't even updating existing nonce offsets lol
-
m-relay
<neromonero1024:monero.social> even assuming a 30% performance hit, those x9 miners are still be more profitable than Ryzens
-
m-relay
<neromonero1024:monero.social> currently, they offer about 400 H/J; a 30% hit is still 260 H/J whereas Ryzens are 150-175 H/J at max
-
m-relay
<neromonero1024:monero.social> assuming current profit, it will pay itself off in about 1.5 years
-
sech1
7945HX set a record of 225 H/J. This system:
minisforum.com/products/minisforum-bd795m
-
sech1
19.2 KH/s and 85 W at the wall
-
sech1
I am in contact with the owner of this system and he is willing to test all tweaks
-
sech1
SG2044 has 32 memory channels which is impressive. The question is how many memory sticks (or soldered RAM) per CPU Bitmain was able to squeeze into that case.
-
eureka_
gingeropolous it is, just very expensive
-
sech1
In any case, if SG2044 is so good with memory, it means it's compute limited. So any increase in compute in v2 -> direct hashrate hit for X9
-
DataHoarder
yeah, modern cpu memory latencies have increased or not improved while the rest did
-
DataHoarder
L3 stays the same-ish as couple of years ago for baseline
-
DataHoarder
but L1/L2 have grown a lot, mostly depending on the hot paths and data to stay in chip instead of RAM
-
DataHoarder
the cheap EPYC you can find from clouds (not vanilla models) also tend to have worse L2 latency for example
chipsandcheese.com/p/amds-epyc-7j13-zen-3-customized
-
eureka_
even mobile CPUs have had ever-increasing caches, my laptop has 80K/96K L1 (p/e), 2M/512K L2 (p/e), 36M L3
-
eureka_
it's ridiculous
-
DataHoarder
note L1/L2 is usually given on per-core values
-
DataHoarder
while L3 is given for the whole cpu
-
DataHoarder
so you want to divide that per thread as well :)
-
eureka_
^^
-
DataHoarder
cpus are also increasing cores, so L3 increases proportionally
-
eureka_
SMT on P-cores, no SMT on E-cores there, but not much worth using SMT for RandomX on this CPU
-
DataHoarder
most zen stuff has 2MB/thread L3
-
eureka_
-
DataHoarder
except zen 5c (or 4c?) which is 1MB/thread L3
-
DataHoarder
with unaffected L2/L1
-
eureka_
unfortunately heavily affected by throttle on this machine, without throttle it would be fairly decent at RandomX for an intel CPU
-
DataHoarder
the 2 MB/thread has more or less not improved (besides X3D) across vendors tbh
-
DataHoarder
meanwhile Zen5 has 1MB L2 per core
-
DataHoarder
:D
-
sech1
SG2044 has 2 MB L2 per cluster of 4 cores
-
sech1
which (and 64 MB L3 available to the whole CPU) makes it very fast in single thread benchmarks
-
eureka_
10 years ago I used to buy xeon for home machines just to get 25-50MB L3, now it's available on consumer grade
-
DataHoarder
^ and some top of the line server cpus are reducing L3 in favor of higher core density (while leaving L1/L2 more or less the same)
-
eureka_
or increasing L2 even, some as low as 1.375 L3/core
-
eureka_
at least on intel
-
DataHoarder
-
eureka_
seems they run out of tricks to stay competitive and are resorted to super-sizing caches to keep up
-
DataHoarder
192 cores, 1 MB/core L2, 2MB/core L3 (1MB/thread)
-
eureka_
225K/s RandomX for that chip
-
DataHoarder
well, a lot of what these cpus run are wide operations that don't deal well on GPUs still
-
DataHoarder
each core is independent and doesn't need to share across L3 as much
-
eureka_
yeap
-
DataHoarder
-
eureka_
with full NUMA enabled and VMs pinned to nodes without overlap, can get some extremely high virtualisation density on zen5 epycs
-
DataHoarder
warning eureka_
-
DataHoarder
that benchmark is for 2 CPUs
-
DataHoarder
9965
-
eureka_
meanwhile I'm still here with sad 7402p outperformed by my laptop (:
-
eureka_
aye
-
eureka_
384 thread, but no SMT in that benchmark I assume
-
eureka_
benchmark with 576 thread is only 195K
-
DataHoarder
ok the 9755 benchmarks are borked
-
eureka_
a lot of xmrig benchmark results are not ideal, wish we had users who knew how to tune better
-
eureka_
not a lot of overclockers submitting benchmarks either, would be fun to see how high single thread can get
-
DataHoarder
better find the engineering sample :D
-
eureka_
RandomX is a great general purpose benchmark these days imo
-
DataHoarder
those tend to be "better"
-
eureka_
much better than older synthetic benchmarks running small code that fits in cache and gives inflated picture of core performance
-
eureka_
hashrate comparison between CPUs is more realistic for real world comparisons
-
eureka_
at least single threaded, with cache taken into account
-
DataHoarder
-
eureka_
yeap
-
eureka_
anyway, getting sg204{2,4} in hands of chips&cheese would be fun for benchmarks
-
DataHoarder
there's also Xeon Phi :D
-
eureka_
new phi :V when
-
DataHoarder
it even had avx 512
-
eureka_
give me 192x gracemont with SMT4 on a PCIe card pls
-
eureka_
instead of that old slow shit
-
DataHoarder
also 4x SMT
-
eureka_
i liked the knight's corner cards, very unique design
-
eureka_
P54C core but forcefully expanded to 64 bit datapath, no MMX or x87 or any legacy stuff, boots up right into 64 bit no real mode
-
eureka_
sort-of-AVX-512 but not really
-
eureka_
not totally x86 compatible.. gcc considered architecture to be k1om instead :D
-
DataHoarder
-
DataHoarder
so can test with differing L1/L2/L3 sizes and program size, and op distribution
-
eureka_
i forward ported kernel support for those cards to newer 4.x kernel some years ago, for fun experiments with containers
-
eureka_
considered doing silly budget hosting on the cards, 8 VMs with 7 cores/2GB each, SAN storage
-
eureka_
but keeping gcc up to date was so much work
-
DataHoarder
and even if you can upstream it it'd require a maintainer
-
eureka_
yep.. nobody wants MIC in today, and knight's landing was real x86 so no point
-
eureka_
it's just historical curiosity
-
eureka_
but for a few short months, best damn cryptonight miner you could get before ASICs :D
-
sech1
-
sech1
Damn, it failed CI. Everything worked in QEMU :D
-
sech1
huh, and it works in different configs (both with and without vector and aes)