-
tevador
Btw, this whole thing is one FDIV_M RandomX instruction for RV64GC:
paste.debian.net/hidden/93bdf103
-
sech1
Where did you get this code? RV64GC + vector instructions will be more compact
-
tevador
I wrote it. We have to target RV64GC, which has no vector support.
-
tevador
RV64GC is the "base" ISA for Linux.
-
tevador
When I compile some code with GCC, it will use -march=rv64imafdc_zicsr_zifencei, so this is our base ISA, to be more precise. It has RV64GC + CSR + FENCE.I.
-
tevador
It's safe to assume that anything that can run Linux will be at least rv64imafdc_zicsr_zifencei since that's what the linux kernel uses.
-
sech1
X5 has vector instructions by the way
-
hyc
then it must be C920, not C910
-
tevador
Yes, I don't think you can get anywhere near x86 performance without the vector extension.
-
sech1
are you using qemu, or did you get a real risc-v board?
-
tevador
I have a board with SiFive U74
-
sech1
nice
-
sech1
qemu is too slow even on 7950X. But being able to run 32 threads kind of fixes it :D
-
hyc
I'm still waiting for delivery of my lichee pi 4
-
tevador
-
hyc
isa-ext seems rather unhelpful
-
tevador
based on my research, the only reliable way to detect extensions is to run it and catch SIGILL...
-
tevador
I'm really starting to appreciate the x86 cpuid instruction.
-
sech1
Just assume RV64GC
-
sech1
as minimum supported for RandomX
-
sech1
and yes, test everything else
-
sech1
As far as I can see, you only need to test vector instructions and aes instructions
-
sech1
Everything else is covered by RV64GC
-
sech1
I started working on ARM64 code for RandomX CFROUND abd AES tweaks
-
sech1
Then I realized my RPi doesn't have AES, so I spent the day setting up aarch64 ubuntu in qemu
-
tevador
RV64GC has no rotate and scaled addition, these need Zba and Zbb extensions.
-
sech1
hmm, interesting
-
sech1
no rotate, as in ROR/ROL?
-
tevador
As for AES, I don't know any chips that support it.
-
sech1
not a big deal, ROR/ROL can be replaced by a couple bitshifts + logical or
-
tevador
Yes, rotate has to be emulated with shifts.
-
tevador
4 instructions instead of 1
-
sech1
risc-v cryptography extensions are ratified already, so future chips will have aes
-
tevador
Btw, there are at least 2 incompatible vector extensions being used in the wild: version 0.7.1 (SG2042 has it) and version 1.0.0.
arxiv.org/abs/2304.10324
-
hyc
sech1 you don't have an arm64 smartphone to run on?
-
sech1
I have
-
sech1
But I need to ssh into it somehow to use it as a "aarch64 dev machine", and I'm always too lazy to set it up. I only use it for final testing in termux
-
sech1
btw I get 22 h/s in qemu on 7950X (single thread JIT)
-
sech1
and overall it's fast enough to not be annoying :D
-
hyc
heh
-
hyc
I build it all on termux.
-
hyc
doesn't take much to get the build env set up
-
hyc
and you can install openssh in termux too
-
hyc
ssh in is easy enough
-
hyc
of course I usually clone the repos from my laptop. don't have the patience to download everything from the web again
-
hyc
the other way I go, if it's just a quick compile/test, is leave the binary on my laptop, running tinyhttpd
-
hyc
then just grab the binary on the phone using any browser
-
sech1
plus, I can run 32 threads in randomx-benchmakr (dataset init is only 7 seconds)
-
sech1
and 32 threads when compiling