09:00:36 <sech1> @hyc someone tested SG2042 with tevador's code: https://user-images.githubusercontent.com/1006477/275177467-cc78752e-28d5-4293-b18e-787566b246ba.png
09:01:09 <sech1> Only 1356 h/s. It means SG2042 in X5 must have hardware AES and FP vector instructions
09:01:25 <sech1> Because it does 11780 h/s
09:01:35 <sech1> 8-9 times faster
09:05:34 <hyc> yah just saw that
09:06:28 <hyc> I wonder if the Zb extensions would have made much difference
09:06:42 <sech1> Not much, they save only few instructions
09:06:53 <sech1> but vector extensions can make huge difference
09:08:13 <hyc> sg2042 / C920 definitely has vector
09:09:07 <hyc> that's the main difference to C910
09:12:38 <hyc> "C920 adopts a state of the art 12-stage out-of-order multiple issue superscalar pipeline with high frequency, IPC, and power efficiency, with a 128-bit vector unit implementing the RISC-V V Extension 0.7.1."
09:13:56 <hyc> of course the current spec is 1.1, and will be 2.0 by the time it's finalized
09:16:18 <sech1> he did run with 64 threads
09:16:30 <sech1> that explains such low hashrate
09:17:57 <hyc> https://xrvm.com/product/xuantie/4224888731980599296
09:18:07 <hyc> fp16 and fp32 vector
09:18:48 <sech1> SG2042 also has L3 cache slicing (each group of 4 cores has "near" 4 MB of L3 cache)
09:19:01 <sech1> I suspect the latency for these 4 MB will be much smaller
09:19:27 <sech1> so scratchpad allocation will need to make OS API calls to arrange that
09:23:34 <hyc> 1MB L2 per 4-core cluster, 64MB L3 for entire chip
09:25:43 <hyc> should support numactl etc
09:30:48 <hyc> I thought only argon2 would take advantage of vector instructions?
09:32:49 <hyc> https://github.com/riscv/riscv-v-spec/releases/tag/0.7.1
09:32:58 <hyc> it's the oldest release there...
10:28:40 <pauliouk> you mean my poor little c910 beaglev-ahead is only going to scrape 100h/s :(
11:11:20 <elucidator> ROI in 2e2 years
11:22:15 <pauliouk> to be fair, the IO is going to come in useful, although its a bit overkill for controlling my greenhouse :D got some pi-zero's that'd do the job just as well :D
12:15:54 <sech1> hyc vector instructions are used for regular RandomX instructions too, all RandomX FP registers are 128-bit (2x64)
12:46:23 <Inge> so everyone is going to test out the pi5 then?
14:42:46 <hyc> hmm. so one of the older chips, C906 also had RVV 0.7.1. Some folks have suggested C910 has it too, they just didn't document it.
14:43:54 <pauliouk> just got my amazon of the bits I needed for the beaglev if you want to check anything on it?
14:46:34 <hyc> and this older doc also says it explicitly https://ftp.libre-soc.org/466100a052.pdf
14:48:23 <hyc> ah yes, and this review says it too https://linuxgizmos.com/dev-kit-debuts-risc-v-xuantie-c910-soc-with-a-3d-gpu-and-android-and-linux-support/
14:48:50 <hyc> so it looks like it'll be worthwhile to implement RVV 0.7.1 and see what difference it makes
14:49:32 <tevador> vector 0.7.1 is an incompatible draft version, I'm not planning to implement it
14:49:44 <hyc> hmmm. I need a few more ports on my ethernet switch...
14:50:17 <hyc> tevador: yeah I know it's incompatible but nobody has the later versions anywhere do they?
14:50:35 <hyc> and these chips are on the market now. might as well see how much it helps
14:50:55 <tevador> the first chips with V 1.0 are expected perhaps next year
14:51:39 <hyc> I may take a stab at it. we can always #ifdef it away later.
14:54:23 <tevador> It will be hard to do. Some asm mnemonics are the same for V0.7.1 and V1.0, but the binary code is different.
14:56:41 <tevador> It could make sense for xmrig, but I don't think it's worth it for the RandomX library. And you can't expected competitive hashrates without hardware AES anyways.
14:57:15 <hyc> good point
14:58:14 <tevador> SG2042R - the "R" could mean RandomX and they could have some custom extensions for it. RISC-V has opcode space reserved for custom instructions.
15:13:16 <sech1> Do you mean they could've implemented some RandomX instructions 1:1?
15:13:23 <sech1> Not just AES?
15:31:09 <hyc> heh then it really would be an ASIC after all 
15:31:19 <tevador> Yes, some helper instructions for RandomX. We can't rule that out without seeing the risc-v binaries.
15:32:02 <hyc> well I've booted up my licheepi4a but they recommend I update the firmware image. it's got debian preloaded
15:33:36 <Lyza> maybe we can live in a world where all major CPU manufacturers add RandomX helper instructions (:
15:34:44 <sech1> A helper instruction to load data from scratchpad masked address (L1/L2 mask) would be very useful
15:34:56 <sech1> It could save 2-3 instructions on every scratchpad reading instruction
15:35:01 <sech1> tevador ^
15:35:16 <tevador> Yes, that's one of the instructions I had in mind.
15:40:21 <hyc> randomx-tests passes. skipped randomx_reciprocal_fast and cache init sse /avx. no surprise there
15:40:57 <tevador> Yes, those are x86-only.
15:41:32 <hyc> running the 10M benchmark now 
15:43:57 <hyc> mem init was 29.5805s with 4 threads
15:44:00 <tevador> What is your hashrate?
15:44:11 <tevador> 10M might take 1 day or more
15:44:17 <hyc> I should've tested 1M first 
15:44:19 <hyc> lemme kill this
15:44:37 <tevador> for me 10M took about 38 hours.
15:45:34 <hyc> yeah. I only got 46.77 H/s
15:45:40 <hyc> on 1000 nonces
15:46:37 <tevador> 1 thread?
15:46:44 <hyc> 4threads
15:47:16 <hyc> and with hugepages
15:47:41 <tevador> Shoud be more. It's a more powerful chip than the JH7110 I have.
15:47:49 <hyc> 23.933 H/s with 2 threads
15:48:15 <hyc> yeah seems a bit too slow
15:48:31 <tevador> Should be > 100 H/s without large pages based on the results from felixonmars.
15:48:51 <hyc> maybe the newer firmware will improve that
15:49:20 <hyc> I don't believe the chip is throttling, I've got the heatpad and fan mounted
15:51:45 <hyc> but the results are consistent. 11.998H/s 1 thread
15:53:15 <hyc> gcc (Debian 13.2.0-4revyos1) 13.2.0
15:53:25 <hyc> I wonder if the compiler is just bad
16:04:29 <hyc> 11.28H/s without largepages
16:05:05 <hyc> memory init 38.6s
16:13:32 <tevador> Lichee Pi 4A is TH1520 quad core. The benchmarks in the PR show 35 H/s with 1T and 104 H/s with 4T. 
16:14:05 <tevador> My tests are with gcc (Debian 12.2.0-10) 12.2.0
16:26:33 <hyc> yeah I wonder which board felixonmars used
16:27:29 <hyc> looks like they use the same u-boot setup as arm android. flash mode comes up as an android fastboot usb device. same tool is used.
16:35:53 <pauliouk> hmm time to get the beagle started up... turns out the random microb-usb cable I had in my draw (don't remember buying it, so probably found it years and years ago) is urm, well poo. So lets try a new shiny one :D
16:37:03 <hyc> booted up again with fresh firmware
16:38:01 <hyc> got 34.89H/s this time, without largepages
16:38:40 <hyc> it's a newer kernel build, september vs july.
16:39:29 <hyc> 43.35H/s with largepages. 1 thread.
16:40:31 <hyc> 78.71H/s 2 threads
16:41:26 <hyc> 132.54H/s 4 threads
16:42:12 <hyc> that 34.89 matches felixonmars' result without largepages
16:45:13 <hyc> I guess I'll run a 1M and 10M now
16:46:16 <hyc> should be a little over 2 hours for 1M
16:48:46 <pauliouk> hmm my android tv boxes got 1M in around an hour
16:49:19 <hyc> sure but they have all the acceleration goodies
16:49:39 <tevador> hyc: cool
16:54:07 <tevador> hyc: is this the default build or native?
16:54:26 <hyc> I just used "cmake .." no other options
16:54:37 <tevador> OK, so the default one (rv64gc)
16:56:33 <hyc> I guess we don't care about the 1M or 10M result on default build now? should I just stop this and rebuild with native?
16:59:39 <tevador> The default build is more important because monerod will ship with it.
16:59:48 <hyc> ok
17:00:14 <hyc> I'll get the 1M and 10M results on default build then.
17:03:03 <tevador> It seems that monero doesn't release binaries for risc-v yet. But it might be a good time to start. https://github.com/monero-project/monero/releases/tag/v0.18.3.1
17:04:01 <hyc> will have to see how stable the toolchains are. sipeed is still maintaining their own patches to gcc
17:05:01 <hyc> I guess if we're doing generic rv64gc that shouldn't matter
17:05:16 <tevador> yes, I think rv64gc has been stable for quite some time
17:06:15 <hyc> about as appealing as raspberry pi ... bleah
17:07:39 <pauliouk> meh, well beagle isn't auto connecting via ethernet, boots damn quick though. I just don't have a means of connecting to the damn thing. Got the screen hooked up, but can't think of a way off hand of 'forwarding' my keyboard to it through the usb-b micro cable :/
17:07:43 <tevador> IMO it makes more sense than the armv7 build
17:08:20 <pauliouk> and the uart->usb isn't connecting either by the looks of it
17:11:35 <hyc> I have a bunch of wireless airmouse minikeyboards for my tvboxes. plugged one of those into usb
17:11:49 <hyc> but the one I used right now, the mouse pointer isn't working. oh well
17:13:09 <hyc> tevador yeah we should prob think about dropping all of the 32bit builds
17:14:16 <pauliouk> beagle doesn't come with usb2/3 :| well it has, but I'm powering it from my usb3 on the 'host' machine
17:16:59 <tevador> When this RandomX patch is included in monerod, I'm going to run a node on my risc-v board.
17:19:04 <hyc> I'm prob gonna check how well it streams movies :P
17:20:00 <hyc> I wonder if termux has risc-v binaries yet. if I decide to install android
17:25:45 <pauliouk> finally! damn thing decided to wake up
17:26:56 <pauliouk> Linux BeagleV 5.10.113-g52fbe8443ea1-dirty #1 SMP PREEMPT Tue Jul 11 17:16:44 UTC 2023 riscv64 riscv64 riscv64 GNU/Linux
17:30:14 <tevador> cool
17:31:34 <tevador> hyc: does your kernel list any isa extensions in /proc/cpuinfo?
17:53:51 <pauliouk> It still astounds me that this little circuit board, powered by a USB cable, runs faster than and does way more things than the $1600 PC from 25 years ago sitting in the cupboard next to me could ever dream of doing
17:59:09 <pauliouk> granted, takes a bit longer to build xmrig from src on this than it does on a 7950x... but heck, its smaller than my work pass
18:00:57 <selsta> tevador: our depends build system allows for cross compiling to risc-v, we just don't have reproducible builds setup yet so no release https://github.com/monero-project/monero/actions/runs/6384731213/job/17327984104
18:02:52 <tevador> What needs to be done to get reproducible builds for riscv64?
18:03:10 <pauliouk> hyc, quick question - what flags should I use for compiling xmrig on this C910?
18:06:28 <selsta> Would this be a Linux risc-v release? or which OS do most people use?
18:07:36 <pauliouk> just grabbed the git and built libuv, openssl and hwloc as static libraries
18:08:44 <tevador> selsta: yes, I would assume that linux-riscv64 would be the most useful release.
18:09:30 <selsta> I think we would have to add g++-riscv64-linux-gnu here, and a couple more steps that hyc knows best https://github.com/monero-project/monero/blob/master/contrib/gitian/gitian-linux.yml
18:10:25 <selsta> and something to HOSTS
18:18:39 <tevador> selsta: I found a related bitcoin PR https://github.com/bitcoin/bitcoin/pull/13665
18:24:52 <selsta> nice, shouldn't be too much work to adapt for our codebase. we also don't need changes for qt and related packages.
20:49:42 <hyc> tevador: /proc/cpuinfo https://paste.debian.net/hidden/5f97a3f0/
20:49:49 <hyc> it's a 5.10 kernel
20:51:03 <hyc> selsta yeah I think mostly it should be a simple dropin to gitian-linux.yml
20:52:22 <hyc> I tried cmake -DARCH=native and it still just did -march=rv64gc ... hmm
20:55:42 <hyc> ah yeah both zba and zbb give illegal instruction
21:01:57 <hyc> so default and native are identical here
21:07:42 <hyc> looking at that bitcoin PR, a lot of it is just updating autoconf to detect rv64 machine
21:17:36 <hyc> heh. so we already have a 64core chip here. that 64bit thread affinity mask is going to need increasing soon ;)
21:22:02 <hyc> anyway, my 1M run matched
21:37:58 <selsta> I'll try in the next days to add riscv to gitian