02:15:32 Woow, that is a very beautiful machine 02:15:37 Woaw, that is a very beautiful machine 05:43:03 a pretty wasteful design. 64 core chip, 64MB cache, so you can only use 32 cores per chip 05:46:00 sg2042 analysis published a couple weeks ago https://arxiv.org/abs/2309.00381 05:53:20 Wait, 18 CPUs per machine? lol 05:54:21 And SODIMMs, lol 05:54:32 It's $20k+ per machine in componnents 05:59:34 SG2042R is not the regular SG2042 though, so it probably has AES 06:00:28 And one memory stick per CPU? 212/18 = 11.78 kh/s per CPU, so they probably reduced CPU speed and voltage as much as possible to make it hash at 11-12 kh/s 06:00:39 One memory stick can do 11-12 kh/s 06:02:13 Yes, it's DDR-3200, it can do this 06:02:16 DDR4-3200 06:04:01 lol https://youtu.be/RkWGbkJjXa8?t=242 06:15:02 If SG2042R is a version of SG2042 with 32 cores/64 MB cache and AES, it makes the most sense 06:15:54 11778/32 = 368 h/s per thread, that's low. They don't need 2 GHz for this. 06:18:13 Since each thread is so slow, the balance between RandomX program execution latency and DDR4 latency is shifted to program execution - so any increase in program size will decrease their hashrate proportionally. Our RandomX tweaks (256 -> 288 instructions + 16 AES instructions in the main loop) will slow it down 15%, from 212 kh/s to 185 kh/s 06:19:39 And Ryzen CPUs will get a small 2% hashrate increase while keeping power usage the same (I tested it yesterday) 06:20:27 So the tweaks won't brick X5, but it will reduce it's efficiency quite a bit, it will be on par with Ryzen rigs after this 06:25:51 hm, sg2042r, the R could stand for "reduced" - reducing core count to 32 could make sense 06:28:03 half as many cores per NUMA region, but same number of regions 06:29:24 I'm sure they also added AES instructions 06:29:41 No way they get this efficiency with software AES 06:30:03 Hardware AES, 32 cores with 64 MB cache, clocked very low and with low voltage 06:30:43 And a single DDR4-3200 SODIMM, it's probably also very energy efficient because it's a notebook memory 06:31:22 8 GB stick = only a few memory chips, and they can run it at < 1.1V with loose timings 06:32:24 368 h/s per thread means they need memory with < 150 ns latency which is 2 times more than a regular DDR4 latency 06:32:48 DDR4 latency is 40-80 ns depending on board and CPU design 07:13:03 Found the memory sticks they use: https://jm.pl/en/8gbddr43200mhz260p-memory-module-ddr4-so-dimm-2133mhz-8gb-512mx8/9208147/produkt/ 07:13:07 M4SE-8GSSOC0M-F 07:13:35 DD=VDDQ= 1.2 Volt (1.14V~1.26V) 07:13:43 They definitely set it to 1.14V or even lower 07:13:53 "Low-Power auto self-refresh (LPASR)" 07:14:46 This is Innodisk M4SE modules: https://www.rosch-computer.de/produkt/speicher-ssd-ram/dram-module/embedded/so-dimm/ddr4/innodisk/DDR4-M4SE 07:15:40 $107.5 per stick: https://www.wiredzone.com/shop/product/10022826-innodisk-m4se-8gssoc0m-fs168-memory-8gb-ddr4-3200mhz-2rx8-sodimm-mem-dr480l-il02-so32-9241 07:19:26 I can't even find what timings these modules have :D 07:30:09 they must have found a source of memories like used corporate laptops to source it cheap 07:30:52 They've been mining since December 2021, and if my math is correct, each device mined 20-30 XMR since then. Even if they sell them for $3k, they got no more than $8k from each device, and I doubt Bitmain even got their money back, lol 07:31:44 Of course if you order 3000*18 = 54k DDR4 sticks, they will be cheaper than $100 per piece 07:32:08 and I doubt that they tore down 27k laptops :D 07:32:13 It was just a bulk order 07:32:46 I wonder if they utilize this design for data centers etc. looks very much like very "general purpose" ai board 07:33:44 They are selling them for 3k and just the memory costs 2k? lol 07:35:01 maybe they are pulling amd athlon and disabling bad cores on the 64 core soc to utilize bad batches 07:36:03 Judging by the performance, the SG2042R chip would have a market price of at least $500 a piece. 07:39:23 If they sold the miner at the market price of the components, it should be at least $12k. So there is no real cost advantage compared to a dual Epyc setup. 07:40:16 tevador unrelated to the current discussion: did you check how many more infinities we get with 288 program size? 07:40:42 I also just realized that since the AES tweak mixes F and E registers better, infinities don't hurt scratchpad entropy anymore 07:41:15 btw I finished my RandomX v2 aarch64 implementeation 07:42:21 elucidator it can be an early batch of SG2042 with added hardware AES. Yes, they can have 64 cores on chip, but only 32 cores active - putting a bad batch to use, as you say 07:45:00 I will check the effect of the 256->288 change later. I first need to integrate the VM/JIT changes. 07:46:30 RandomX v2 will not brick Bitmain X5, based on what I know so far 07:46:46 It must have hardware AES to get this level of performance 07:47:47 368 h/s per threads = 150 ns per one loop iteration, so they are limited by CPU loop execution, not by memory latency. Increasing program size to 288 will automatically slow it down by 12.5% 07:48:26 And adding AES in the main loop will also slow it down by 1-2% 07:48:38 So I expect RandomX v2 will slow down X5 by ~15% 07:53:28 now the only hardware aes is left to wonder 07:55:06 I wonder why the board has this chip: https://en.sophgo.com/product/introduce/cv1835.html 07:55:15 perhaps for AES? 07:56:10 Ah, it's the controller board. 08:00:59 elucidator About hardware aes - the only way to find it out is to disassemble their risc-v firmware code 08:05:10 they must have bought or had manufactured all of their hardware in huge bulks... or they've "falled off the back of a production line" somewhere 08:07:39 In other news, Raspberry Pi 5 finally has AES: https://hackster.imgix.net/uploads/attachments/1634129/image_7QeR7W8qx0.png?auto=compress%2Cformat&w=1280&h=960&fit=max 08:08:36 X5 costs $20k if you sell individual components. In other words, Bitmain could get $20k if they sold components of X5, not X5 08:08:48 It definitely costs them less than $20k to produce 08:09:53 Since RPi5 finally has AES, RandomX v2 AES tweak is just in time :D 08:38:11 RPi5 is probably going to sell at around the same price as a Lychee 4 I'd imagine, if not higher :/ 08:42:12 sech1: for the life of me, I can't find the source of the rV binary anywhere on that firmware image. 08:43:10 xmrig gen binary does some spi operations but can't see anything related to the actual operations. just fetch run fetch run 08:47:29 paulio_uk: Raspberry Pi 5, coming at the end of October. Priced at $60 for the 4GB variant, and $80 for its 8GB sibling 08:53:25 RPi5 looks juicy for Monero + p2pool + xmrig combo 08:54:33 Even 4GB variant can run all three, if p2pool runs with "--no-cache --no-randomx" 08:54:48 and it has AES, so it should have decent hashrate 08:55:04 \o/ 08:56:46 https://qu.ax/XEp.jpg m2 hat looks like this, slap a 1 TB and long term local node 08:56:56 I'll order 8GB variant. monerod will run with 2 GB dataset for fast block verification, p2pool will run with "--no-cache --no-randomx", and there will still be 6 GB left for xmrig 08:57:20 Question is, does it support 256 GB microSD cards? 08:59:03 It can also boot from external SSD connected to USB3 08:59:48 And it's fast enough: https://youtu.be/9hYfQ7bRgZg?t=794 09:04:47 sech1: all the raspi models can boot from usb hdd 09:05:00 I do it with rpi4 at home rn 09:05:25 Is microSD card fast enough for Monero node? 09:05:31 Or external SSD is the way to go? 09:06:01 never tried, I don't use big SD cards. don't trust their reliability. either only use it for redirecting the boot or don't use it at all 09:06:17 afaik microsds are pretty slow and degrade pretty quickly 09:06:22 $80? That surprises me. Guess they'll do the "We're out of stock" thing 10seconds after it launches 09:06:40 definitely ^ 09:06:49 5 minutes later... RPi5 for $400 on ebay :D 09:07:00 why break with traditions huh? :D 09:07:15 I still cannot comprehend why they are so expensive 09:07:24 I got a 3b+ in 2018 for $30 09:07:50 and for the first 6 months, you can't order a board but you can order a kit with extra 10 LEDs and resistors and a crappy how to blink led book for $999.99 09:08:16 it's a "kit" so parts add value ofc 09:09:16 https://matrix.monero.social/_matrix/media/v1/download/agoradesk.com/XOQnHbXegiAjVUGLJCozKtoF 09:09:24 https://matrix.monero.social/_matrix/media/v1/download/agoradesk.com/EqMjskpfrSawujoyYJXXULDR 09:09:33 Source: https://hardforum.com/threads/please-help-test-my-microsd-card-crystaldiskmark.1999752/ 09:11:37 I guess microSD is fine for running an already synchronized node (do sync on desktop, then just copy over the blockchain) 09:15:05 yeah probably, but still you better install log2ram and enable zram instead of swap for longer SD card life 09:15:58 actually it's better if you just enable overlayfs on SD card if OS is running on it, then keep the blockchain and node files on external drive 09:21:42 lol, log2ram reminded me of the old days where I put Windows temp folders on the ramdrive to speed up the system (it was running on an HDD, SSDs weren't a thing yet) 09:28:48 Oh lol, I used to do that with DJGPP on DOS before I switched to linux :D Flood of memories... 09:30:01 All system includes on a ramdisk. I loved DOS, you could understand the OS back then... 09:35:23 I even remember experimenting with WinPE (Windows version that could boot from CD and run entirely on the ramdrive) 09:52:55 Only 150 kh/s? https://cdn.discordapp.com/attachments/789946262203924530/1156703452362182696/IMG_5347.png 09:54:19 Theoretical hashrate 71.94 kh/s per board, 11.99 kh/s per chip 09:54:36 But it does only half that 09:54:43 36-37 kh/s per board 09:55:21 110 kh/s based on "Real Time Hashrate" numbers 09:57:11 ok, "Chain's rate" graph shows that it started at 70 kh/s, but then slowed down to 36. Overheating probably 10:00:12 Discord messages say that it only happens on Zephyr. So non-Monero coins make it blow up :D 10:00:34 Need to wait for Monero tests - actual sustained hashrates and power usage numbers 10:21:40 hmm yeah total of "real" is more like ~110k 10:32:10 That would be fun if it can theoretically do 212 kh/s, but overheats and drops to 110k :D 10:43:11 turns out managing 18 cpus is difficult 10:44:30 Cat Processing Unit 10:45:34 based on the images of the hardware, does it look repurposable? capable of being repurposed 10:49:30 i don't see any sata ports or even USB headers 10:51:21 i guess if it was cracked, though, you could feed it data to crunch over the network. I mean, if its a modded CPU at 32 threads, thats still 576 threads per unit. 10:59:46 or if they are original spec, 1152 cores in 22620 cm^3 11:00:13 erg, threads 11:08:26 You could at least rip out DDR4 sticks and sell them for $2k :D 11:09:13 Their control board runs some flavor of Linux, but I don't think they run any OS on SG2042 boards. They just straight up upload RandomX code there and run it 11:09:34 If someone hack the firmware and finds way to modify it, it could be a nice number cruncher with 576 threads 11:11:02 Compile your workload with the rx VM as target and send it as a "mining" job :D 11:13:48 On the other hand, if you need 576 threads to compute something, you probably need a GPU 11:14:08 RandomX is unique because it's not SIMD, and 99.9% heavy threaded workloads are SIMD 11:14:33 GPUs will crush this potential "hacked X5" on these workloads 11:16:22 you'd be surprised how useful pure thread count can be for scientific workloads. genetic sequence alignment, for example. 11:19:28 I'm sure it can be ported to GPU 11:19:51 X5 only has 32 (possibly 64) threads and 8 GB RAM per chip, so it's limited in what it can do 11:23:05 yeap https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3086-9 11:23:40 "Overall on Geforce GTX 1080 Ti GPU, GASAL2 is up to 21x faster than Parasail on a dual socket hyper-threaded Intel Xeon system with 28 cores" 11:29:05 I can't come up with a workload that requires CPU specifically. Maybe code compilation or databases, but 8 GB per chip is not enough 12:29:59 There are some numerical algorithms that cannot be sped up through parallelization, so they require faster single-thread performance rather than lots of threads 12:30:53 Though I'm not sure that the SG2042 could compete even in that 12:40:56 It's known that nothing is known about possible parallelization :D https://en.wikipedia.org/wiki/P-complete#Motivation 12:41:52 Even if some algorithm is sequential, you can run 576 different instances of it on X5 12:44:24 lol, RIP https://twitter.com/MiningRabid/status/1707237819373293758 12:52:49 https://en.m.wikipedia.org/wiki/Parallel_algorithm#Parallelizability 12:59:50 Yes, it's classic "dependency chain" issue in parallelization 13:00:20 But the other wiki page says "it is not known whether there are any tractable problems that are inherently sequential" 13:00:36 so even Newton's method can be theoretically parallelized (or at least it's not disproven yet) 13:02:45 Sure, but "let's buy GPUs to run thus because it's not proven it will always be slow" isb't a great pitch to the boss :) 13:03:26 What's of salience here is practical speed now (or at least in the near future) no ? 13:03:46 yes 13:04:38 I just didn't hear much about people buying i9-13900ks in bulk, LN2 cooling and overclocking them to speed up their sequential algorithms :D 13:04:50 But GPUs in datacenters are everywhere 13:06:05 yup yup, our place has 4 12u racks full of the nvidia non-gpu gpu processing cards 13:06:41 damn things sit idle a lot of the time 13:15:15 AWS and Azure (and I'm pretty sure GCP too) have a "compute optimized" tier of VMs, offering fewer cores but with higher clock speeds ("up to" 3.5-3.6 GHz on AWS) compared to the other tiers 13:16:25 (Although "fewer" is a relative term, given that the biggest amd ones still reach 192C/384T) 13:17:06 could still host a decent network doom tournament on them 13:18:08 Though I guess if you actually need that much speed, you're probably better off looking into some dedicated fpga/asic to optimize that specific compute step? 13:19:27 pretty sure my place only bought the kit outright because we started looking into AI automation, and someone high up had a friend whos neighbour's dogwalker's aunt's first cousin's son runs a hardware firm that deals in them, and they probably had shares in it, so why not spend the public money on it? 13:22:30 moneromooo, cat processing unit <3 13:24:49 The more watts, the better for cat 13:25:46 yes, I think if some sequential algorithm is very important, there will be some specialized chips for it 14:05:05 RIP https://www.youtube.com/watch?v=WRLZitXASaM 14:18:29 dayum 14:18:55 It's locked to Monero only :D 14:19:39 such enterprise much wow 14:19:54 it makes sense, other coins use different RandomX configs 14:20:30 most, but Zephyr seems to use an unmodified rx, unfortunately 14:20:31 No, it can't even mine Zephyr which uses stock RandomX 14:21:03 They are either very bad at writing software and hard-coding everything for Monero, or they deliberately locked it to Monero 14:23:14 maybe they hard locked it to XMR so they can mine Zeph with less competition on their next system =p 14:23:17 But they advertised it as multi-coin miner. False advertising? Selling used hardware? Classic Bitmain 14:23:38 tru 14:24:12 if zephyr is the same PoW as monero, I don't see why it would not work unless it's intentionally only working with XMR addresses 14:24:35 The theory is that their firmware can't process address formats other than XMR 14:24:46 Or they intentionally locked it 14:25:11 Mining job header has fork version and block height, so it's easy to lock it to XMR 14:26:01 I suggested him to try to connect to xmrig-proxy with fake XMR address, but it "didn't work" 14:29:03 most HPC/scientific computing tasks are easily parallelizable because it's all matrix math 14:39:32 I saw N-body problem as an example of "hard to parallelize" algorithm 14:39:50 Because each body interacts with all other bodies 14:40:39 I see why 14:41:00 all parallel threads will have to access and modify the same data structure representing the N bodies 15:18:03 or do two phases, with redundant copies of all state. 2nd phase to coalesce the multiple views back into a single one. 15:19:30 but now we're tactually talking rocket science, of course it's not straightforward or easy 15:20:18 even solving the original rocket equation isn't easy. 15:23:23 One phase (where you calculate next positions) is easy to parallelize, but the other phase where you have to sync all threads, is mostly sequential 15:32:55 yes 15:33:06 There's no escaping Amdahl's Law 15:40:31 btw, bitmain > apple, RAM not soldered to the board! 15:51:44 lol 15:58:10 At least buyers can recoup some costs after it gets useless :D 15:58:17 18 DDR4 sticks :D 16:03:08 heh but ddr4-3200 is already obsolete 16:04:55 if someone jailbreaks it, at least they can upgrade the RAM to a usable amount for some number crunching 16:09:44 That would require not only jailbreak, but also a custom BIOS. They hardcoded their firmware for these specific modules (timings, voltage, frequencies etc) 16:10:11 It's a shame Bitmain doesn't opensource their firmware 16:10:18 And a breach of GPLv3 :D 16:10:34 They use some "xmrig-mango" binary 16:31:01 "3 people now with dead hashboards" from discord 16:40:52 live now https://www.youtube.com/watch?v=VdcK1QMDA1E 17:12:46 It takes 1 minute before it starts mining... Do they generate the dataset on control board and then upload it to risc-v boards? lol 17:17:50 They probably didn't bother implementing JIT compiler for SuperscalarHash. My quad core risc-v board takes 9 minutes to initialize the dataset. 17:22:26 oh, so they run it without JIT for dataset 17:31:33 Quality! https://p2pool.io/u/ef6a78845da8a060/image.png 17:51:58 well they'd already beat on these things for 2 years. thermal stress was bound to destroy them 17:57:21 I read in discord that people set fans to 50% and then VRMs overheated 18:10:28 so they did it to themselves? should have left the fans alone? 18:11:14 Probably. But consumer products shouldn't just die after poking around in the official web gui :D 19:44:39 why is it every unboxing video/notes I've seen all complain about it only being able to mine XMR? I mean thats what it's sold as :P 19:45:11 now if BitMain threw in a p2pool node and remote monero node into the X5, that would have been pretty damn nice 20:02:33 They advertised it as "multi currency miner" 20:09:50 ah well thats worth a damn slap then :| 21:46:49 sech1: given the excellent ROI for Bitmain, is it even worth it to hamstring Rx on the X5? 23:14:01 Todd doubling down on his weird takes "it's a significant step towards the creation of a Monero ASIC" 23:15:40 s/weird/dumb/ 23:17:20 I have to say I'm pretty impressed how accurately sech1, hyc, and tevador guessed at what was in these things before seeing a teardown 23:19:00 well the SG2042R is a decent step towards creating a processor nicely suited to RandomX, but I still think if ARM was a bit more open source, it could be a better route to take 23:21:01 the X5 however, is more of a significant step towards bankruptcy than a Monero ASIC 23:21:59 It’s just a step towards next gen cpus 23:23:46 well given they (the subsidiary) are developing the chips for wider use cases, it's really a case of how much did Bitmain spend on all the other bits (e.g. RAM) 23:25:54 but yeah, even 2 years mining on them then offloadig at 3k per unit, they surely must have lost money on this endeavor 23:26:21 Won’t be surprised if it’s stated funded r&d for cpus 23:26:39 that's a good point 23:34:14 ^^ https://www.hpcwire.com/2023/07/19/how-china-is-building-an-open-national-chip-plan-around-risc-v/ 23:37:25 Yup x6 might already be on the network, they need to stress test their cpus 23:40:50 just as rx should form part of the common CPU benchmarks, chip manufacturers should design around rx ;) 23:42:10 it was always hyc's retort, to build a rx asic is basically building a better cpu 23:43:06 Yup better cpus that is and small rx tweaks are fine