16:35:43 Btw, this whole thing is one FDIV_M RandomX instruction for RV64GC: https://paste.debian.net/hidden/93bdf103/ 16:45:34 Where did you get this code? RV64GC + vector instructions will be more compact 16:50:18 I wrote it. We have to target RV64GC, which has no vector support. 16:50:36 RV64GC is the "base" ISA for Linux. 16:53:43 When I compile some code with GCC, it will use -march=rv64imafdc_zicsr_zifencei, so this is our base ISA, to be more precise. It has RV64GC + CSR + FENCE.I. 16:55:19 It's safe to assume that anything that can run Linux will be at least rv64imafdc_zicsr_zifencei since that's what the linux kernel uses. 16:56:15 X5 has vector instructions by the way 16:56:37 then it must be C920, not C910 16:56:43 Yes, I don't think you can get anywhere near x86 performance without the vector extension. 16:57:01 are you using qemu, or did you get a real risc-v board? 16:57:34 I have a board with SiFive U74 16:58:13 nice 16:58:39 qemu is too slow even on 7950X. But being able to run 32 threads kind of fixes it :D 16:59:03 I'm still waiting for delivery of my lichee pi 4 17:00:33 https://paste.debian.net/hidden/0faa7063/ 17:01:39 isa-ext seems rather unhelpful 17:04:23 based on my research, the only reliable way to detect extensions is to run it and catch SIGILL... 17:06:52 I'm really starting to appreciate the x86 cpuid instruction. 17:09:24 Just assume RV64GC 17:09:35 as minimum supported for RandomX 17:09:42 and yes, test everything else 17:10:43 As far as I can see, you only need to test vector instructions and aes instructions 17:10:53 Everything else is covered by RV64GC 17:17:21 I started working on ARM64 code for RandomX CFROUND abd AES tweaks 17:17:50 Then I realized my RPi doesn't have AES, so I spent the day setting up aarch64 ubuntu in qemu 17:23:09 RV64GC has no rotate and scaled addition, these need Zba and Zbb extensions. 17:23:30 hmm, interesting 17:23:37 no rotate, as in ROR/ROL? 17:23:42 As for AES, I don't know any chips that support it. 17:24:04 not a big deal, ROR/ROL can be replaced by a couple bitshifts + logical or 17:24:10 Yes, rotate has to be emulated with shifts. 17:24:43 4 instructions instead of 1 17:25:09 risc-v cryptography extensions are ratified already, so future chips will have aes 19:37:45 Btw, there are at least 2 incompatible vector extensions being used in the wild: version 0.7.1 (SG2042 has it) and version 1.0.0. https://arxiv.org/abs/2304.10324 21:31:21 sech1 you don't have an arm64 smartphone to run on? 21:36:39 I have 21:39:04 But I need to ssh into it somehow to use it as a "aarch64 dev machine", and I'm always too lazy to set it up. I only use it for final testing in termux 21:39:58 btw I get 22 h/s in qemu on 7950X (single thread JIT) 21:40:20 and overall it's fast enough to not be annoying :D 21:42:26 heh 21:42:44 I build it all on termux. 21:42:55 doesn't take much to get the build env set up 21:43:13 and you can install openssh in termux too 21:43:20 ssh in is easy enough 21:44:07 of course I usually clone the repos from my laptop. don't have the patience to download everything from the web again 21:45:39 the other way I go, if it's just a quick compile/test, is leave the binary on my laptop, running tinyhttpd 21:45:50 then just grab the binary on the phone using any browser 21:49:13 plus, I can run 32 threads in randomx-benchmakr (dataset init is only 7 seconds) 21:49:16 and 32 threads when compiling