#monero-pow

02:41

eureka_

I don't think it's appropriate to say X9 was designed specifically for RandomX if it does end up being SG2044, the Sophon CPUs end up in many products
02:42

eureka_

that HPC benchmark comparison linked a few days ago was enlightening as to why the X5 didn't hit back-of-envelope potential hashrates, the memory controller is significantly less efficient than available on mainstream x86 chips
02:43

eureka_

and with RandomX being very sensitive to memory performance, makes sense
02:43

eureka_

we won't find out what X9 is actually running for another several months :(
02:43

eureka_

if change happens to be primarily memory controller improvements, that is a surprising performance leap
02:44

eureka_

less stalled cycles on the table
02:45

eureka_

anyway, bitmain could probably make a CPU that natively implements the RandomX ISA but that would be far riskier than the RISC-V route (no pun intended :^)) and completely ignores any alternative markets
02:46

eureka_

sg2042 and sg2044 are rather appealing as general purpose CPUs, I would not mind having a server with one or two of them but the retail price is too much
02:47

eureka_

chances of X5 or X9 having a high bandwidth interconnect from the CPUs to the controller is low, otherwise I would outright purchase one and work to achieve arbitrary code execution
02:48

eureka_

1152 cores with only SPI/I2C or maybe a UART for outside connectivity is such a shame
02:58

eureka_

@elongated; bitmain/sophon CPUs are still commodity/general purpose even though X5/X9 makes use of them. they're just in an unusual form factor compared to a typical server. there's no point in making RandomX less efficient for RISC-V as a whole
03:02

eureka_

frankly I doubt bitmain/sophon re-designed sg2044 specifically for RandomX as primary use case, sg2042 core is from T-head and not developed in house and it's doubtful Bitmain went full custom for sg2044. yes, it's probable they chose certain aspects of SoC configuration for X9 use case, but those changes improve the CPU's performance as a whole, not just mining-specific
03:04

DataHoarder

^ they could have selected to disable specific features not used by randomx, for example, branch predictor units as the branching is random, and improve efficiency there
05:43

gingeropolous

wish they'd just make it a general purpose computer and sell it straight.
05:44

DataHoarder

that does not much for their marketing
05:44

DataHoarder

then they'd also have to support users :')
05:45

DataHoarder

and they aren't even updating existing nonce offsets lol
06:42

m-relay

<neromonero1024:monero.social> even assuming a 30% performance hit, those x9 miners are still be more profitable than Ryzens
06:42

m-relay

<neromonero1024:monero.social> currently, they offer about 400 H/J; a 30% hit is still 260 H/J whereas Ryzens are 150-175 H/J at max
06:42

m-relay

<neromonero1024:monero.social> assuming current profit, it will pay itself off in about 1.5 years
07:10

sech1

7945HX set a record of 225 H/J. This system: minisforum.com/products/minisforum-bd795m
07:10

sech1

19.2 KH/s and 85 W at the wall
07:11

sech1

I am in contact with the owner of this system and he is willing to test all tweaks
07:12

sech1

SG2044 has 32 memory channels which is impressive. The question is how many memory sticks (or soldered RAM) per CPU Bitmain was able to squeeze into that case.
07:12

eureka_

gingeropolous it is, just very expensive
07:12

sech1

In any case, if SG2044 is so good with memory, it means it's compute limited. So any increase in compute in v2 -> direct hashrate hit for X9
07:14

DataHoarder

yeah, modern cpu memory latencies have increased or not improved while the rest did
07:15

DataHoarder

L3 stays the same-ish as couple of years ago for baseline
07:15

DataHoarder

but L1/L2 have grown a lot, mostly depending on the hot paths and data to stay in chip instead of RAM
07:16

DataHoarder

the cheap EPYC you can find from clouds (not vanilla models) also tend to have worse L2 latency for example chipsandcheese.com/p/amds-epyc-7j13-zen-3-customized
07:17

eureka_

even mobile CPUs have had ever-increasing caches, my laptop has 80K/96K L1 (p/e), 2M/512K L2 (p/e), 36M L3
07:17

eureka_

it's ridiculous
07:18

DataHoarder

note L1/L2 is usually given on per-core values
07:18

DataHoarder

while L3 is given for the whole cpu
07:18

DataHoarder

so you want to divide that per thread as well :)
07:18

eureka_

^^
07:18

DataHoarder

cpus are also increasing cores, so L3 increases proportionally
07:18

eureka_

SMT on P-cores, no SMT on E-cores there, but not much worth using SMT for RandomX on this CPU
07:18

DataHoarder

most zen stuff has 2MB/thread L3
07:19

eureka_

astr.al/u/db04e7db.txt
07:19

DataHoarder

except zen 5c (or 4c?) which is 1MB/thread L3
07:19

DataHoarder

with unaffected L2/L1
07:19

eureka_

unfortunately heavily affected by throttle on this machine, without throttle it would be fairly decent at RandomX for an intel CPU
07:20

DataHoarder

the 2 MB/thread has more or less not improved (besides X3D) across vendors tbh
07:21

DataHoarder

meanwhile Zen5 has 1MB L2 per core
07:21

DataHoarder

:D
07:22

sech1

SG2044 has 2 MB L2 per cluster of 4 cores
07:22

sech1

which (and 64 MB L3 available to the whole CPU) makes it very fast in single thread benchmarks
07:25

eureka_

10 years ago I used to buy xeon for home machines just to get 25-50MB L3, now it's available on consumer grade
07:25

DataHoarder

^ and some top of the line server cpus are reducing L3 in favor of higher core density (while leaving L1/L2 more or less the same)
07:25

eureka_

or increasing L2 even, some as low as 1.375 L3/core
07:26

eureka_

at least on intel
07:26

DataHoarder

look techpowerup.com/cpu-specs/epyc-9965.c3904
07:26

eureka_

seems they run out of tricks to stay competitive and are resorted to super-sizing caches to keep up
07:27

DataHoarder

192 cores, 1 MB/core L2, 2MB/core L3 (1MB/thread)
07:27

eureka_

225K/s RandomX for that chip
07:27

DataHoarder

well, a lot of what these cpus run are wide operations that don't deal well on GPUs still
07:28

DataHoarder

each core is independent and doesn't need to share across L3 as much
07:28

eureka_

yeap
07:28

DataHoarder

this is the non zen 5c top techpowerup.com/cpu-specs/epyc-9755.c3881
07:28

eureka_

with full NUMA enabled and VMs pinned to nodes without overlap, can get some extremely high virtualisation density on zen5 epycs
07:29

DataHoarder

warning eureka_
07:29

DataHoarder

that benchmark is for 2 CPUs
07:29

DataHoarder

9965
07:29

eureka_

meanwhile I'm still here with sad 7402p outperformed by my laptop (:
07:29

eureka_

aye
07:29

eureka_

384 thread, but no SMT in that benchmark I assume
07:29

eureka_

benchmark with 576 thread is only 195K
07:29

DataHoarder

ok the 9755 benchmarks are borked
07:30

eureka_

a lot of xmrig benchmark results are not ideal, wish we had users who knew how to tune better
07:31

eureka_

not a lot of overclockers submitting benchmarks either, would be fun to see how high single thread can get
07:31

DataHoarder

better find the engineering sample :D
07:31

eureka_

RandomX is a great general purpose benchmark these days imo
07:31

DataHoarder

those tend to be "better"
07:31

eureka_

much better than older synthetic benchmarks running small code that fits in cache and gives inflated picture of core performance
07:32

eureka_

hashrate comparison between CPUs is more realistic for real world comparisons
07:32

eureka_

at least single threaded, with cache taken into account
07:33

DataHoarder

also stuff like: chipsandcheese.com/i/174871357/memo…-subsystem-and-numa-characteristics
07:34

eureka_

yeap
07:41

eureka_

anyway, getting sg204{2,4} in hands of chips&cheese would be fun for benchmarks
07:44

DataHoarder

there's also Xeon Phi :D
07:45

eureka_

new phi :V when
07:45

DataHoarder

it even had avx 512
07:45

eureka_

give me 192x gracemont with SMT4 on a PCIe card pls
07:45

eureka_

instead of that old slow shit
07:46

DataHoarder

also 4x SMT
07:46

eureka_

i liked the knight's corner cards, very unique design
07:46

eureka_

P54C core but forcefully expanded to 64 bit datapath, no MMX or x87 or any legacy stuff, boots up right into 64 bit no real mode
07:47

eureka_

sort-of-AVX-512 but not really
07:47

eureka_

not totally x86 compatible.. gcc considered architecture to be k1om instead :D
07:47

DataHoarder

meanwhile I moved my go-randomx stuff to parameterized VM config git.gammaspectra.live/P2Pool/go-ran…782e95351a7555f640a5f430a87dfca54f8
07:47

DataHoarder

so can test with differing L1/L2/L3 sizes and program size, and op distribution
07:47

eureka_

i forward ported kernel support for those cards to newer 4.x kernel some years ago, for fun experiments with containers
07:49

eureka_

considered doing silly budget hosting on the cards, 8 VMs with 7 cores/2GB each, SAN storage
07:49

eureka_

but keeping gcc up to date was so much work
07:50

DataHoarder

and even if you can upstream it it'd require a maintainer
07:50

eureka_

yep.. nobody wants MIC in today, and knight's landing was real x86 so no point
07:50

eureka_

it's just historical curiosity
07:51

eureka_

but for a few short months, best damn cryptonight miner you could get before ASICs :D
13:56

sech1

hyc tevador/RandomX #316
14:01

sech1

Damn, it failed CI. Everything worked in QEMU :D
14:04

sech1

huh, and it works in different configs (both with and without vector and aes)

5 months ago

« a day earlier

a day later »

today »