-
l-m
hello all. i am the creator of RandomX.js (
github.com/l1mey112/randomx.js) and i would like to ask for feedback
-
l-m
i am currently in the process of speculating how i can make performance improvements, being that my library is only 5 times slower than the reference implementation. if i can make 10% improvements here and there, i would be happy to reach even 25-35 H/s at most
-
l-m
i have an issue tracker here for performance tweaks:
l1mey112/randomx.js #1
-
l-m
there has not been a webmining implementation of the monero POW in years, would anyone be able to gauge the performance of previous cryptonight POW webminers compared to the hashrate of mine?
-
sech1
Cudos for effort, although the biggest performance hit is inability to use the 2GB full dataset mining mode.
-
sech1
fused multiply-add instructions can't be used because they don't do rounding between MUL and ADD, so the end result will be different
-
l-m
i use FMA instructions for efficient emulation of the differing counting modes. for example you can take a multiplication and subtract the result of that multiplication from itself "without an intermediate round", to get the error in that operation. you can take that error and adjust the final floating point number by branching on the sign of that
-
l-m
error term. look up compensated summation/two-sum, two-product, error free transforms.
indico.cern.ch/event/313684/contrib…nts/600513/826490/FPArith-Part2.pdf for an introduction
-
l-m
implementations of different rounding modes with FMA are just a couple cycles slower, but without FMA you can still emulate it effectively with a ~10 FP operation overhead, even less on superscalar + out of order machines.
-
l-m
directed rounding isn't my problem really, its just the AES. though there is no way to be sure without instrumentation/performance counters which i am working on currently
-
sech1
Ah, FMA is for this purpose. Then it should work. I used it for the same purpose in my RandomX OpenCL code, it works and it's deterministic.
-
sech1
Software AES gives ~30% slowdown in the native code, it should be probably the same in WebAssembly
-
l-m
given this then, im guessing all of the overhead lies in the fact that repeat instantiations of the JIT WASM code 8 times per chained VM execution is the issue.
-
l-m
there is just too much overhead performing `new WebAssembly.Module()`
-
l-m
there is work done for WebAssembly baseline JITs to make this faster, but in reality the library incurs cost generating the WASM, then allowing the host to generate the native code
-
sech1
That 1 h/s per thread estimate was made way back in 2019, WASM was probably in much worse shape back then
-
sech1
Or I think the estimate was made for a pure JS interpreter
-
l-m
pure JS interpreter most likely
-
sech1
There was even an implementation made in 2019 by someone, I remember checking that website and it indeed was less than 1 h/s
-
sech1
But they didn't even verify correctness of the hashes
-
l-m
i could have seen RandomWOW being much faster than RandomX due to its "light" implementation, but the fact that it uses 16 chained executions means i loose out on all gains
-
l-m
i just checked, and on the randomwow branch after adjusting the amount of program iterations from 16 to 8 i achieve 48 H/s
-
l-m
reaching true a "light" randomx
-
sech1
"there is just too much overhead performing `new WebAssembly.Module()`" can't you keep the instance between RandomX VM executions? I'm not very familiar with WASM
-
l-m
sech1 the JIT generates a randomx program in WASM, which needs to be instanciated to be converted to native code, 8 times per chained execution. JIT -> WASM -> new WebAssembly.Module() -> native code. the JS runtime/host needs to perform another step converting the WASM to native code, in a native environment, you can just JIT the native code.
-
l-m
to run WASM code, the host needs to compile it some way into native code. there is quite an overhead doing so, you don't get access to native code immediately
-
sech1
So it's kind of running a compiler to get native code, every time. Yes, it will be slow.
-
l-m
especially if you're only using it once and throwing it away, there is no chance to optimise at all and you'll be left with the subpar baseline JIT
-
l-m
v8.dev/blog/liftoff - v8 for example would generate subpar code for the randomx VM
-
hyc
well yes, that's all to be expected. the point was making each program run only once before chaining to the next one, so none of the first is reusable
-
m-relay
<iudfasjdjf:luc.cat> black lives dont matter nigger!
-
l-m
hello again. can someone provide me resources on how to better understand how xmrig works and the stratum protocol? i am looking to reimplement the randomx part of xmrig to run in the browser, and release it as open source software (with a small dev fee, like xmrig)
-
l-m
i am quite new to the implementations of cryptocurrency mining, it would be good to understand how block templates work, nonces when mining, im all very new
-
l-m
i saw that sech1 is a developer on xmrig, maybe you could give me a small rundown?