03:14:49 that is fire. havent looked at zluda in a while. amazing that it works 03:22:25 would prefer webgpu compute shader over opencl. recently built llama.cpp with opencl support for my phone and it turns out the performance is not much different than just using the cpu. It seems that valve is more active than the npu divisions (npus might be wasted silicon ngl) at qualcomm. [dawn uses vulkan] 03:26:26 it seems like as a platform webgpu + vulkan stack is the better choice. Easier to distribute. Also works in the browser out of the box + there are runtimes for the backend that support it. 03:47:04 There's a couple sufficiently portable options for writing code for GPUs. No one has done it though. I'd probably defer to OpenCL, but WebGPU is another obvious candidate. 03:47:23 But it's less about pipeline nowadays and more about how no one has done it. 04:00:30 There's already a very notable improvement that can be made to scanning via batch point compression for the ECDH, which I haven't seen implemented anywhere. 04:17:50 Does that use batch invert? 04:18:47 Where multiple ECDH's are represented in projective form, and then batch inverse is applied to the denominator of each ECDH? 04:18:53 That's a great idea 04:19:29 I wonder what % of the runtime of X25519 is taken by inversion 04:37:30 Compression uses an inversion. Scanning outputs requires hashing shared secrets, which requires compressing the points to bytes. We can use a batch inversion there across a set of outputs being scanned to only require a single inversion for the conversion to bytes. 04:37:59 We can also speed up CLSAG verification and tree building with a batch hash to point. 04:49:50 https://github.com/monero-oxide/monero-oxide/issues/128 04:49:51 https://github.com/monero-oxide/monero-oxide/issues/130 08:23:24 my recent batch inversion code however hasn't been ported to GPU yet, but yeah, that could allow the batches 08:24:27 tbh moving the ECDH batch inversion + keccak step to GPU and doing rest on CPU would already give quite many benefits. specially post carrot with 3-byte view tags 09:07:51 Added batched `Bytes()` and `BytesMontgomery()` to my edwards25519 library fork, https://git.gammaspectra.live/P2Pool/edwards25519/commit/0a9ae297f83be25b93990aa43c052f969d38333f so I'll be tinkering with this