-
m-relay
<recanman:agoradesk.com> Woow, that is a very beautiful machine
-
m-relay
<recanman:agoradesk.com> Woaw, that is a very beautiful machine
-
hyc
a pretty wasteful design. 64 core chip, 64MB cache, so you can only use 32 cores per chip
-
hyc
sg2042 analysis published a couple weeks ago
arxiv.org/abs/2309.00381
-
sech1
Wait, 18 CPUs per machine? lol
-
sech1
And SODIMMs, lol
-
sech1
It's $20k+ per machine in componnents
-
sech1
SG2042R is not the regular SG2042 though, so it probably has AES
-
sech1
And one memory stick per CPU? 212/18 = 11.78 kh/s per CPU, so they probably reduced CPU speed and voltage as much as possible to make it hash at 11-12 kh/s
-
sech1
One memory stick can do 11-12 kh/s
-
sech1
Yes, it's DDR-3200, it can do this
-
sech1
DDR4-3200
-
sech1
-
sech1
If SG2042R is a version of SG2042 with 32 cores/64 MB cache and AES, it makes the most sense
-
sech1
11778/32 = 368 h/s per thread, that's low. They don't need 2 GHz for this.
-
sech1
Since each thread is so slow, the balance between RandomX program execution latency and DDR4 latency is shifted to program execution - so any increase in program size will decrease their hashrate proportionally. Our RandomX tweaks (256 -> 288 instructions + 16 AES instructions in the main loop) will slow it down 15%, from 212 kh/s to 185 kh/s
-
sech1
And Ryzen CPUs will get a small 2% hashrate increase while keeping power usage the same (I tested it yesterday)
-
sech1
So the tweaks won't brick X5, but it will reduce it's efficiency quite a bit, it will be on par with Ryzen rigs after this
-
hyc
hm, sg2042r, the R could stand for "reduced" - reducing core count to 32 could make sense
-
hyc
half as many cores per NUMA region, but same number of regions
-
sech1
I'm sure they also added AES instructions
-
sech1
No way they get this efficiency with software AES
-
sech1
Hardware AES, 32 cores with 64 MB cache, clocked very low and with low voltage
-
sech1
And a single DDR4-3200 SODIMM, it's probably also very energy efficient because it's a notebook memory
-
sech1
8 GB stick = only a few memory chips, and they can run it at < 1.1V with loose timings
-
sech1
368 h/s per thread means they need memory with < 150 ns latency which is 2 times more than a regular DDR4 latency
-
sech1
DDR4 latency is 40-80 ns depending on board and CPU design
-
sech1
-
sech1
M4SE-8GSSOC0M-F
-
sech1
DD=VDDQ= 1.2 Volt (1.14V~1.26V)
-
sech1
They definitely set it to 1.14V or even lower
-
sech1
"Low-Power auto self-refresh (LPASR)"
-
sech1
-
sech1
-
sech1
I can't even find what timings these modules have :D
-
elucidator
they must have found a source of memories like used corporate laptops to source it cheap
-
sech1
They've been mining since December 2021, and if my math is correct, each device mined 20-30 XMR since then. Even if they sell them for $3k, they got no more than $8k from each device, and I doubt Bitmain even got their money back, lol
-
sech1
Of course if you order 3000*18 = 54k DDR4 sticks, they will be cheaper than $100 per piece
-
sech1
and I doubt that they tore down 27k laptops :D
-
sech1
It was just a bulk order
-
elucidator
I wonder if they utilize this design for data centers etc. looks very much like very "general purpose" ai board
-
tevador
They are selling them for 3k and just the memory costs 2k? lol
-
elucidator
maybe they are pulling amd athlon and disabling bad cores on the 64 core soc to utilize bad batches
-
tevador
Judging by the performance, the SG2042R chip would have a market price of at least $500 a piece.
-
tevador
If they sold the miner at the market price of the components, it should be at least $12k. So there is no real cost advantage compared to a dual Epyc setup.
-
sech1
tevador unrelated to the current discussion: did you check how many more infinities we get with 288 program size?
-
sech1
I also just realized that since the AES tweak mixes F and E registers better, infinities don't hurt scratchpad entropy anymore
-
sech1
btw I finished my RandomX v2 aarch64 implementeation
-
sech1
elucidator it can be an early batch of SG2042 with added hardware AES. Yes, they can have 64 cores on chip, but only 32 cores active - putting a bad batch to use, as you say
-
tevador
I will check the effect of the 256->288 change later. I first need to integrate the VM/JIT changes.
-
sech1
RandomX v2 will not brick Bitmain X5, based on what I know so far
-
sech1
It must have hardware AES to get this level of performance
-
sech1
368 h/s per threads = 150 ns per one loop iteration, so they are limited by CPU loop execution, not by memory latency. Increasing program size to 288 will automatically slow it down by 12.5%
-
sech1
And adding AES in the main loop will also slow it down by 1-2%
-
sech1
So I expect RandomX v2 will slow down X5 by ~15%
-
elucidator
now the only hardware aes is left to wonder
-
tevador
-
tevador
perhaps for AES?
-
tevador
Ah, it's the controller board.
-
sech1
elucidator About hardware aes - the only way to find it out is to disassemble their risc-v firmware code
-
paulio_uk
they must have bought or had manufactured all of their hardware in huge bulks... or they've "falled off the back of a production line" somewhere
-
sech1
-
sech1
X5 costs $20k if you sell individual components. In other words, Bitmain could get $20k if they sold components of X5, not X5
-
sech1
It definitely costs them less than $20k to produce
-
sech1
Since RPi5 finally has AES, RandomX v2 AES tweak is just in time :D
-
paulio_uk
RPi5 is probably going to sell at around the same price as a Lychee 4 I'd imagine, if not higher :/
-
elucidator
sech1: for the life of me, I can't find the source of the rV binary anywhere on that firmware image.
-
elucidator
xmrig gen binary does some spi operations but can't see anything related to the actual operations. just fetch run fetch run
-
elucidator
paulio_uk: Raspberry Pi 5, coming at the end of October. Priced at $60 for the 4GB variant, and $80 for its 8GB sibling
-
sech1
RPi5 looks juicy for Monero + p2pool + xmrig combo
-
sech1
Even 4GB variant can run all three, if p2pool runs with "--no-cache --no-randomx"
-
sech1
and it has AES, so it should have decent hashrate
-
m-relay
<hbs:matrix.org> \o/
-
elucidator
qu.ax/XEp.jpg m2 hat looks like this, slap a 1 TB and long term local node
-
sech1
I'll order 8GB variant. monerod will run with 2 GB dataset for fast block verification, p2pool will run with "--no-cache --no-randomx", and there will still be 6 GB left for xmrig
-
sech1
Question is, does it support 256 GB microSD cards?
-
sech1
It can also boot from external SSD connected to USB3
-
sech1
-
elucidator
sech1: all the raspi models can boot from usb hdd
-
elucidator
I do it with rpi4 at home rn
-
sech1
Is microSD card fast enough for Monero node?
-
sech1
Or external SSD is the way to go?
-
elucidator
never tried, I don't use big SD cards. don't trust their reliability. either only use it for redirecting the boot or don't use it at all
-
m-relay
<recanman:agoradesk.com> afaik microsds are pretty slow and degrade pretty quickly
-
paulio_uk
$80? That surprises me. Guess they'll do the "We're out of stock" thing 10seconds after it launches
-
elucidator
definitely ^
-
sech1
5 minutes later... RPi5 for $400 on ebay :D
-
paulio_uk
why break with traditions huh? :D
-
m-relay
<recanman:agoradesk.com> I still cannot comprehend why they are so expensive
-
m-relay
<recanman:agoradesk.com> I got a 3b+ in 2018 for $30
-
elucidator
and for the first 6 months, you can't order a board but you can order a kit with extra 10 LEDs and resistors and a crappy how to blink led book for $999.99
-
elucidator
it's a "kit" so parts add value ofc
-
m-relay
-
m-relay
-
m-relay
-
sech1
I guess microSD is fine for running an already synchronized node (do sync on desktop, then just copy over the blockchain)
-
elucidator
yeah probably, but still you better install log2ram and enable zram instead of swap for longer SD card life
-
elucidator
actually it's better if you just enable overlayfs on SD card if OS is running on it, then keep the blockchain and node files on external drive
-
sech1
lol, log2ram reminded me of the old days where I put Windows temp folders on the ramdrive to speed up the system (it was running on an HDD, SSDs weren't a thing yet)
-
moneromooo
Oh lol, I used to do that with DJGPP on DOS before I switched to linux :D Flood of memories...
-
moneromooo
All system includes on a ramdisk. I loved DOS, you could understand the OS back then...
-
sech1
I even remember experimenting with WinPE (Windows version that could boot from CD and run entirely on the ramdrive)
-
sech1
-
sech1
Theoretical hashrate 71.94 kh/s per board, 11.99 kh/s per chip
-
sech1
But it does only half that
-
sech1
36-37 kh/s per board
-
sech1
110 kh/s based on "Real Time Hashrate" numbers
-
sech1
ok, "Chain's rate" graph shows that it started at 70 kh/s, but then slowed down to 36. Overheating probably
-
sech1
Discord messages say that it only happens on Zephyr. So non-Monero coins make it blow up :D
-
sech1
Need to wait for Monero tests - actual sustained hashrates and power usage numbers
-
elucidator
hmm yeah total of "real" is more like ~110k
-
sech1
That would be fun if it can theoretically do 212 kh/s, but overheats and drops to 110k :D
-
gingeropolous
turns out managing 18 cpus is difficult
-
moneromooo
Cat Processing Unit
-
gingeropolous
based on the images of the hardware, does it look repurposable? capable of being repurposed
-
gingeropolous
i don't see any sata ports or even USB headers
-
gingeropolous
i guess if it was cracked, though, you could feed it data to crunch over the network. I mean, if its a modded CPU at 32 threads, thats still 576 threads per unit.
-
gingeropolous
or if they are original spec, 1152 cores in 22620 cm^3
-
gingeropolous
erg, threads
-
sech1
You could at least rip out DDR4 sticks and sell them for $2k :D
-
sech1
Their control board runs some flavor of Linux, but I don't think they run any OS on SG2042 boards. They just straight up upload RandomX code there and run it
-
sech1
If someone hack the firmware and finds way to modify it, it could be a nice number cruncher with 576 threads
-
m-relay
<endor00:matrix.org> Compile your workload with the rx VM as target and send it as a "mining" job :D
-
sech1
On the other hand, if you need 576 threads to compute something, you probably need a GPU
-
sech1
RandomX is unique because it's not SIMD, and 99.9% heavy threaded workloads are SIMD
-
sech1
GPUs will crush this potential "hacked X5" on these workloads
-
gingeropolous
you'd be surprised how useful pure thread count can be for scientific workloads. genetic sequence alignment, for example.
-
sech1
I'm sure it can be ported to GPU
-
sech1
X5 only has 32 (possibly 64) threads and 8 GB RAM per chip, so it's limited in what it can do
-
sech1
-
sech1
"Overall on Geforce GTX 1080 Ti GPU, GASAL2 is up to 21x faster than Parasail on a dual socket hyper-threaded Intel Xeon system with 28 cores"
-
sech1
I can't come up with a workload that requires CPU specifically. Maybe code compilation or databases, but 8 GB per chip is not enough
-
m-relay
<endor00:matrix.org> There are some numerical algorithms that cannot be sped up through parallelization, so they require faster single-thread performance rather than lots of threads
-
m-relay
<endor00:matrix.org> Though I'm not sure that the SG2042 could compete even in that
-
sech1
It's known that nothing is known about possible parallelization :D
en.wikipedia.org/wiki/P-complete#Motivation
-
sech1
Even if some algorithm is sequential, you can run 576 different instances of it on X5
-
sech1
-
m-relay
-
sech1
Yes, it's classic "dependency chain" issue in parallelization
-
sech1
But the other wiki page says "it is not known whether there are any tractable problems that are inherently sequential"
-
sech1
so even Newton's method can be theoretically parallelized (or at least it's not disproven yet)
-
moneromooo
Sure, but "let's buy GPUs to run thus because it's not proven it will always be slow" isb't a great pitch to the boss :)
-
moneromooo
What's of salience here is practical speed now (or at least in the near future) no ?
-
sech1
yes
-
sech1
I just didn't hear much about people buying i9-13900ks in bulk, LN2 cooling and overclocking them to speed up their sequential algorithms :D
-
sech1
But GPUs in datacenters are everywhere
-
paulio_uk
yup yup, our place has 4 12u racks full of the nvidia non-gpu gpu processing cards
-
paulio_uk
damn things sit idle a lot of the time
-
m-relay
<endor00:matrix.org> AWS and Azure (and I'm pretty sure GCP too) have a "compute optimized" tier of VMs, offering fewer cores but with higher clock speeds ("up to" 3.5-3.6 GHz on AWS) compared to the other tiers
-
m-relay
<endor00:matrix.org> (Although "fewer" is a relative term, given that the biggest amd ones still reach 192C/384T)
-
paulio_uk
could still host a decent network doom tournament on them
-
m-relay
<endor00:matrix.org> Though I guess if you actually need that much speed, you're probably better off looking into some dedicated fpga/asic to optimize that specific compute step?
-
paulio_uk
pretty sure my place only bought the kit outright because we started looking into AI automation, and someone high up had a friend whos neighbour's dogwalker's aunt's first cousin's son runs a hardware firm that deals in them, and they probably had shares in it, so why not spend the public money on it?
-
kico
moneromooo, cat processing unit <3
-
sech1
The more watts, the better for cat
-
sech1
yes, I think if some sequential algorithm is very important, there will be some specialized chips for it
-
sech1
-
kico
dayum
-
sech1
It's locked to Monero only :D
-
kico
such enterprise much wow
-
tevador
it makes sense, other coins use different RandomX configs
-
Lyza
most, but Zephyr seems to use an unmodified rx, unfortunately
-
sech1
No, it can't even mine Zephyr which uses stock RandomX
-
sech1
They are either very bad at writing software and hard-coding everything for Monero, or they deliberately locked it to Monero
-
Lyza
maybe they hard locked it to XMR so they can mine Zeph with less competition on their next system =p
-
sech1
But they advertised it as multi-coin miner. False advertising? Selling used hardware? Classic Bitmain
-
Lyza
tru
-
tevador
if zephyr is the same PoW as monero, I don't see why it would not work unless it's intentionally only working with XMR addresses
-
sech1
The theory is that their firmware can't process address formats other than XMR
-
sech1
Or they intentionally locked it
-
sech1
Mining job header has fork version and block height, so it's easy to lock it to XMR
-
sech1
I suggested him to try to connect to xmrig-proxy with fake XMR address, but it "didn't work"
-
hyc
most HPC/scientific computing tasks are easily parallelizable because it's all matrix math
-
sech1
I saw N-body problem as an example of "hard to parallelize" algorithm
-
sech1
Because each body interacts with all other bodies
-
sech1
I see why
-
sech1
all parallel threads will have to access and modify the same data structure representing the N bodies
-
hyc
or do two phases, with redundant copies of all state. 2nd phase to coalesce the multiple views back into a single one.
-
hyc
but now we're tactually talking rocket science, of course it's not straightforward or easy
-
hyc
even solving the original rocket equation isn't easy.
-
sech1
One phase (where you calculate next positions) is easy to parallelize, but the other phase where you have to sync all threads, is mostly sequential
-
hyc
yes
-
hyc
There's no escaping Amdahl's Law
-
tevador
btw, bitmain > apple, RAM not soldered to the board!
-
hyc
lol
-
sech1
At least buyers can recoup some costs after it gets useless :D
-
sech1
18 DDR4 sticks :D
-
hyc
heh but ddr4-3200 is already obsolete
-
tevador
if someone jailbreaks it, at least they can upgrade the RAM to a usable amount for some number crunching
-
sech1
That would require not only jailbreak, but also a custom BIOS. They hardcoded their firmware for these specific modules (timings, voltage, frequencies etc)
-
sech1
It's a shame Bitmain doesn't opensource their firmware
-
sech1
And a breach of GPLv3 :D
-
sech1
They use some "xmrig-mango" binary
-
sech1
"3 people now with dead hashboards" from discord
-
sech1
-
sech1
It takes 1 minute before it starts mining... Do they generate the dataset on control board and then upload it to risc-v boards? lol
-
tevador
They probably didn't bother implementing JIT compiler for SuperscalarHash. My quad core risc-v board takes 9 minutes to initialize the dataset.
-
sech1
oh, so they run it without JIT for dataset
-
sech1
-
hyc
well they'd already beat on these things for 2 years. thermal stress was bound to destroy them
-
sech1
I read in discord that people set fans to 50% and then VRMs overheated
-
hyc
so they did it to themselves? should have left the fans alone?
-
sech1
Probably. But consumer products shouldn't just die after poking around in the official web gui :D
-
paulio_uk
why is it every unboxing video/notes I've seen all complain about it only being able to mine XMR? I mean thats what it's sold as :P
-
paulio_uk
now if BitMain threw in a p2pool node and remote monero node into the X5, that would have been pretty damn nice
-
sech1
They advertised it as "multi currency miner"
-
paulio_uk
ah well thats worth a damn slap then :|
-
Inge
sech1: given the excellent ROI for Bitmain, is it even worth it to hamstring Rx on the X5?
-
jtgrassie
Todd doubling down on his weird takes "it's a significant step towards the creation of a Monero ASIC"
-
jtgrassie
s/weird/dumb/
-
jtgrassie
I have to say I'm pretty impressed how accurately sech1, hyc, and tevador guessed at what was in these things before seeing a teardown
-
paulio_uk
well the SG2042R is a decent step towards creating a processor nicely suited to RandomX, but I still think if ARM was a bit more open source, it could be a better route to take
-
paulio_uk
the X5 however, is more of a significant step towards bankruptcy than a Monero ASIC
-
m-relay
<polar9669:matrix.org> It’s just a step towards next gen cpus
-
jtgrassie
well given they (the subsidiary) are developing the chips for wider use cases, it's really a case of how much did Bitmain spend on all the other bits (e.g. RAM)
-
jtgrassie
but yeah, even 2 years mining on them then offloadig at 3k per unit, they surely must have lost money on this endeavor
-
m-relay
<polar9669:matrix.org> Won’t be surprised if it’s stated funded r&d for cpus
-
jtgrassie
that's a good point
-
jtgrassie
-
m-relay
<polar9669:matrix.org> Yup x6 might already be on the network, they need to stress test their cpus
-
jtgrassie
just as rx should form part of the common CPU benchmarks, chip manufacturers should design around rx ;)
-
jtgrassie
it was always hyc's retort, to build a rx asic is basically building a better cpu
-
m-relay
<polar9669:matrix.org> Yup better cpus that is and small rx tweaks are fine