- 
sech1 hyc you're right, this simple program works fine:  paste.debian.net/hidden/fb9a8aea
- 
sech1 and if you comment the second sys_icache_invalidate, it starts to output random numbers in the second printf 
- 
sech1 I suspect we have an undefined behavior somewhere in the code, and compiler takes "advantage" of it :D 
- 
sech1 well, ubsan did find something:  paste.debian.net/hidden/70411197
- 
sech1 fixing it didn't help 
- 
sech1 lol, it produces different hashes even if I just run JIT compiler once and then just call the generated program repeatedly :D 
- 
sech1 yep, the code is generated only once, write protection is on but hashes still change occasionally:  paste.debian.net/hidden/cb3e2e49
- 
sech1 magic 
- 
sech1 I think the hash gets wrong when the thread is rescheduled to another CPU core. If I increase RANDOMX_PROGRAM_ITERATIONS, I get more incorrect hashes. If I decrease it 4 times, all hashes are corrrect 
- 
sech1 I think it's an unaligned memory read or write somewhere in the JIT code. M1 doesn't segfault on this, it just silently reads garbage :D 
- 
sech1 hyc selsta "On AArch64 X18 is used as a platform register for some platforms, so to generate portable executables it should not be used by the compiler" 
- 
sech1 lol 
- 
sech1 RandomX aarch64 code uses it of course 
- 
sech1 I guess new macOS SDK uses it too and changes it 
- 
sech1 I'm not sure what's happening, but it looks like x18 can change right during the program execution. And it's not related to JIT at all. 
- 
sech1 This program produces different values:  paste.debian.net/hidden/03d0943c (nop instructions are to get correct timings for the bug to happen) 
- 
sech1 Even without JIT, when I call it directly 
- 
sech1 If I change x18 to x10 in the program code, everything starts working magically 
- 
sech1 I was wrong, the previous paste is the working code 
- 
sech1 
- 
sech1 
- 
sech1 find 2 differences :D 
- 
sech1 Now I have to rewrite the whole aarch64 JIT to not use x18, that's unfortunate 
- 
selsta seems Gimp had the same issue when porting to M1 Mac 
- 
selsta also interesting that this didn't trigger on any other aarch64 system yet, seems modern Android also has x18 reserved 
- 
selsta it was documented here, since we thought it's JIT related we kept looking at the wrong documentation  developer.apple.com/documentation/x…ting-arm64-code-for-apple-platforms
- 
sech1 yes, but now I need to rewrite the JIT 
- 
sech1 It still fails now, but at least it's consistent and in one place. So probably some other bug somewhere in the rewritten code. 
- 
sech1 *and fails in the same place 
- 
sech1 github.com/SChernykh/RandomX/tree/fix-x18 - the randomness went away, it always fails at "[85] Hash test 2b (compiler)" now 
 
- 
sech1 Probably some RandomX instruction is not fixed properly 
- 
sech1 But Hash test 2a (compiler) always passes, that's a good sign 
- 
sech1 I'll come back to it in the evening, exhausted now. Maybe hyc or tevador can spot the bug in the meantime. 
- 
selsta also fails at 2b here at 300a0adb47603dedb42228ccb2b211104f4da45af709cd7547cd049e9489c969 
- 
sech1 
- 
sech1 It used x18 and x19 before, then I changed it to x20, x21 but it should've been x19, x20 
- 
sech1 Now it passes all tests 
- 
m-relay <plowsof:matrix.org> 8-D 
- 
sech1 You can make fresh git clone then git checkout fix-18 and try it 
- 
selsta All tests PASSED 
- 
selsta nice :D 
- 
sech1 
- 
sech1 can you try to run the benchmark too? 
- 
sech1 ./randomx-benchmark --mine --init 8 --threads 8 --jit 
- 
sech1 
- 
selsta 
- 
sech1 nice 
- 
sech1 So it wasn't JIT after all :D 
- 
selsta C/C++ CI / build-alpine (aarch64, latest-stable) fails, is this new? 
- 
sech1 Interesting 
- 
sech1 It crashed in dataset initialization 
- 
sech1 But it doesn't crash on M1, weird 
- 
sech1 wtf 
- 
sech1 It looks like I PR'd wrong code 
- 
sech1 no, all good on M1 
- 
sech1 well, now I have to debug it in qemu because M1 works 
- 
sech1 selsta I fixed it, all green now 
- 
selsta once this is merged 4 issues can be closed 
- 
selsta will retest once I'm home 
- 
hyc nice work sech1 
- 
sech1 yeah 
- 
sech1 but it's weird 
- 
sech1 asm code saved and restored register x18 
- 
sech1 it's almost like some kernel interrupt uses it, so it can change in the middle of a function 
- 
sech1 whatever, if Apple say "no touch" that's what we do :D 
- 
hyc ;) 
- 
sech1 my faith in JIT restored 
- 
hyc :) 
- 
selsta this bug resulted in so much weird behavior lol 
- 
selsta even with the old SDK it would crash when started as a detached process 
- 
selsta glad all of this is fixed now and I don't have to constantly change the SDK 
- 
sech1 Apple docs also mention register x29, but it's only used to restore nice callstacks, so we can ignore it 
- 
sech1 p2pool and xmrig also suffered from it (I found reported issues on Github) 
- 
hyc ok, yeah randomx-tests passed here on my M1 
- 
sech1 randomx-benchmark is more relevant, it computes 1000 hashes 
- 
sech1 
- 
hyc I'm still on older SDK which never crashed before 
- 
hyc 1000 nonces ran fine here 
- 
sech1 I tested with newer SDK where it crashed or produced invalid hashes without the fix 
- 
hyc but I guess now that we have this nailed I feel safe in upgrading my machine ;) 
- 
sech1 btw I think we have some unnecessary icache flushes in the code 
- 
sech1 it flushes cache twice 
- 
hyc probably we added extra in prior attempts to kill the bug 
- 
sech1 The one in setPagesRX should be enough 
- 
sech1 I'll remove them later in a separate PR 
- 
sech1 Actually, __builtin___clear_cache in setPagesRX only works on macOS, so __builtin___clear_cache in JIT compiler is still needed :D 
- 
tevador so it was a RandomX bug after all 
- 
sech1 yes 
- 
sech1 
- 
sech1 But I didn't pay attention, I though that saving and restoring it before returning is enough 
- 
sech1 But it's actually a "volatile" register 
- 
sech1 I wouldn't call it a bug. The code correctly saved and restored x18, it was just not designed to be interrupted by kernel (or whatever else) and x18 being different after the interrupt 
- 
tevador 
- 
sech1 Even in this code:  paste.debian.net/hidden/51dbf4fa where x18 is read right after it's written, it can still fail 
- 
tevador The kernel has an explicit exception for it 
- 
hyc gawd 
- 
sech1 
- 
sech1 so it's used in task switching 
- 
sech1 that explains it 
- 
sech1 then it's indeed a volatile register from used code's point of view 
- 
sech1 *user code 
- 
tevador yes, so you'll get wrong hashes if there is a context switch in the middle of a program run 
- 
hyc if you wrote that register you prob wouldn't survive a context switch 
- 
sech1 No, that would be a potential kernel exploit :D 
- 
sech1 Kernel initializes it before using it 
- 
tevador it will probably just store some garbage in the register 
- 
hyc if all it did was corrupt a hash calc, why did users get SEGVs then 
- 
sech1 because old RandomX code used x18 in dataset loads 
- 
tevador if x18 was used to access memory in RandomX code, it could cause a segfault 
- 
sech1 x18 was used everywhere, it was a "temporary register" 
- 
hyc ah. very temporary. 
- 
sech1 most places didn't access memory, so they would just cause an invalid hash 
- 
sech1 but a couple places used x18 for memory loads 
- 
sech1 you can just check the patch 
- 
tevador so if there was a context switch between setting x18 and reading memory, boom 
- 
tevador reminds me of my RISC-V trouble with the x4 register, but it's used for a different purpose there 
- 
sech1 
- 
sech1 boom 
- 
hyc aha 
- 
selsta next release we can add both risc-v support and fix macOS JIT 
- 
hyc great 
- 
selsta I didn't get risc-v gitian working yet but it's ready on RandomX side 
- 
hyc since #281 is merged, shouldn't #262 be closed? it didn't auto-close because it wasn't mentioned in commit msg 
- 
sech1 yes 
- 
sech1 UB sanitizer also found a couple of unaligned memory reads in aarch64 JIT, but I'll fix it in a separate PR tomorrow 
- 
sech1 low priority because it's accessing 8 bytes which are only 4-byte aligned. But all current ARM CPUs don't mind it 
- 
hyc cool