01:14:54 ok, first showing the speed-up of removing the extra ops in `ge25519_frombytes_negate_vartime` using what I believe is what UkoeHB intended in that snippet: https://paste.debian.net/1229132/ 01:15:12 that speeds up the conversion from ed25519->curve25519 by ~85% 01:15:25 will share the view tag included results in a sec 01:22:52 actually not sure if that MSB zero'ing out is right. shouldn't it be `ed25519_pk_copy[0] &= UCHAR_MAX >> 1;` and not `ed25519_pk_copy[31] &= UCHAR_MAX >> 1;`? 01:23:30 most significant bit means the last bit 01:23:41 little endian 01:23:49 unless I am mistaken about the endianness.. 01:25:32 > memcpy(&ed25519_pk_copy, &ed25519_pk, 32); 01:25:32 address-of an array pointer? don't think this is right 01:26:07 ah true, going too fast 01:28:11 I see how I can check this against libsodium too. i'll figure it out 01:57:03 you were right on both, fixed: https://paste.debian.net/1229133/ 01:57:09 speed-up result is same 02:51:15 hi guys 02:52:10 Question for hyc or anyone that could answer regarding https://github.com/monero-project/monero/pull/4694 02:54:37 I am unable to find any usage information for the tool. I see outputs everywhere but cannot seem to find any information on actually running it or flags needed to run specific outputs only, or if it exports full data from block 1, etc.. 02:54:49 * outputs everywhere online but cannot 03:23:43 nm, think I figured it out. was able to export data from utility to excel 03:47:33 that's what --help is for 03:51:28 lol I did not even think about that 03:53:12 you smart man you 06:18:55 knaccc: When including the view tag check, I observed a 55% speed-up from `ed25519 scalar mult -> view tag check` compared to the faster `ed25519->x25519 -> view tag check -> if match, ed25519 scalar mult`: https://paste.debian.net/1229134/ 06:19:13 But there's one aspect to the results that doesn't make sense to me. It seems the view tag check slows the test down by ~5% across both types, rather than a fixed ~10ms as I would have expected. Something interesting going on in there 06:19:30 Also, here's time to hash view tag per 10,000 view tags (keccak, siphash 2-4, blake2, blake3): https://github.com/j-berman/monero/blob/b48a5a3a82ff3945380c408350e95a9b3b670b6a/tests/performance_tests/derive_view_tag.h#L217-L236 06:26:01 here's a slightly cleaner view tag check 55% speed-up paste: https://paste.debian.net/1229139/ 06:56:33 also realized I left out `curve25519 scalar mulst -> view tag check`. Here it is on github: https://github.com/j-berman/monero/blob/curve25519-benchmark/tests/performance_tests/curve25519.h#L1171-L1198 08:14:17 tevador reporting that the speed-up in assembly implementations on the conversion step is not as significant: https://gist.github.com/tevador/50160d160d24cfc6c52ae02eb3d17024?permalink_comment_id=4048951#gistcomment-4048951 10:47:19 jberman great work, thanks! it'll be a huge disappointment if it turns out that the fastest ed25519 lib available is only 12% slower than x25519. that doesn't sound like it might be a compelling-enough improvement 10:52:44 jberman looks like sandy2x isn't all that much faster than amd64-64 :( https://bench.cr.yp.to/impl-scalarmult/curve25519.html 10:56:06 I also realized libsodium is already using it :/ https://github.com/jedisct1/libsodium/blob/dce3bca3bad8553fb942603f6afb8289fb418459/src/libsodium/crypto_scalarmult/curve25519/scalarmult_curve25519.c#L54-L58 11:03:07 jberman so then i guess the problem is that libsodium does not have the fastest ed25519 implementation? 11:04:11 ya I'm finishing up swapping out for Monero functions now and the results seem in line with tevador's conclusion :( 11:04:34 jberman :'( 11:05:06 it was a fun day 11:07:23 jberman i wonder then if the idea that using two different curves depending on whether you're signing or doing ecdh is just rooted in historical speed differences, and if everyone would have just stuck to ed25519 if they could have foreseen these performance improvements... 11:39:39 hmm, you're saying as in the people who in the past decided to implement the ed25519->curve conversion in their systems may not have done so had they foreseen these improvements in ed25519? are these new improvements in the grand scheme of things? 11:40:16 well i wonder if they would have seen a 12% improvement as worth the complexity of dealing with two different curves 11:40:23 which are not interchangeable 11:40:33 not fully interchangeable i mean 11:41:19 i think the argument can still be made that the world's best efforts will always be on improving varbasescalarmults for x25519 and not ed25519 11:41:48 so we can make a strategic choice to switch to x25519 in expectation of taking advantage of those advancements 11:42:10 e.g. the person that started writing GPU code to speed up scalarmults did it for x25519 and not ed25519 11:43:03 here's an article from someone briefly talking about the decision to use both: https://words.filippo.io/using-ed25519-keys-for-encryption/ 11:43:12 doesn't really say much though haha, they just wanted to reuse their x25519 code and needed ed25519 for signing 11:44:03 did you see this link i posted earlier btw https://research.nccgroup.com/2020/09/28/faster-modular-inversion-and-legendre-symbol-and-an-x25519-speed-record/ 11:44:20 that's an example of all of the optimization effort going into x25519 11:44:45 and it'd be great to not have to constantly figure out ourselves how to backport those improvements into the ed25519 version 11:44:54 with the possibility of error that would bring 11:45:57 knaccc: seems reasonable to relatively novice me 11:46:17 ooh 11:46:23 here is an interesting point: 11:46:48 i'm not sure, but i think x25519 scalarmults are constant time, and ed25519 scalar mults aren't 11:46:54 i'm not sure about the latter 11:47:05 but that's an interesting security consideration to avoid leaking the private view key 11:47:55 maybe it's possible to make ed25519 scalarmults happen in constant time 11:48:21 but that would be another example of us constantly putting the ed25519 round peg into the x25519 square hole 11:49:01 so maybe the strategic move would be to use x25519 even if it were 12% *slower* than ed25519! 11:49:32 just for ecdh stuff of course. not elsewhere 12:13:09 what do you think the logic was in choosing to use ed25519 from the start? x25519 was around back then too and it seems like it's all the rage all over the place for key exchange 12:15:50 jberman ed25519 is faster for signature verification, which speeds other things up 12:17:07 schnorr signatures involve fixed base scalarmult added to a variable base scalar mult, and ed25519 does that all at once faster than curve25519 12:20:44 so essentially the tradeoff in the decision was nodes can verify faster but wallets could take longer to scan 12:21:30 jberman yes it looks to me that way. the ring signatures are schnorr-based, and there are a ton of ec ops going on there 12:42:12 seems perfectly sensible to me. gonna head to bed. I think it's a funny coincidence the article ends saying 10% improvements are worth the implementation effort lol. from what I can gather it seems there's a solid case brewing for x25519 12:49:17 this file's the latest I've got and seems to generally line up with tevador's conclusion: https://github.com/j-berman/monero/blob/curve25519-benchmark/tests/performance_tests/curve25519.h 16:47:16 jberman i'm a little confused - ed25519 mult including view tag check is twice as fast as just an ed25519 mult? 17:34:23 In this version I added the equivalent check to get the output's public key in the final step. In the view tag check, it only needs to do that 1/256 times. That check is an extra ed25519 scalar mult base and addition 17:39:56 Now that I think about it doesn't make sense to include that in the curve25519 version of the test. 1 sec I'll just remove that check 17:40:34 jberman[m]: why are you taking address-of here? https://github.com/j-berman/monero/blob/366d43d3094a336296815c04e99f4e7dc94f4129/tests/performance_tests/curve25519.h#L1000 18:02:11 UkoeHB: hm, I see what you're saying. `ed25519_pk_copy` is already an address location pointing to the start of the char array, so no need to pass the address of the address. Interesting the result on my machine is the same either way there, but adding the ref to the pointer `ed25519_pk` param changes the result 18:02:13 did some reading, c_arr decays to &c_arr when passed to a function 18:02:28 so it is equivalent (just unneeded syntax) 18:02:48 ah, got it 18:02:55 the ed25519_pk is already an address, it isn't a c array 18:03:04 a pointer to the first element * 18:03:13 right 18:05:46 btw on stackexchange someone pointed out that libsodium uses crypto_core_hchacha20 on the ecdh shared secret for the famous libsodium secretbox 18:05:51 and that's lightning fast apparently 18:06:32 although i'm not sure about the security guarantees where you have the possibility of two IKMs that differ by only a byte (i.e. the concatenated output index varint) 18:07:12 so i'd not sleep well at night unless someone really fully understands the implications of using hchacha20 18:07:39 * moneromooo makes bank by selling knaccc bulk sleeping pills 18:08:48 lol let's not tempt fate that a bottle of sleeping pills may become the MRL mascot :) 18:10:55 lmao ok the latest file removes the output pub key check and the unneeded syntax, back to sleep for me 18:12:25 (i could prolly use some of those too) 18:12:41 :) 18:26:36 UkoeHB: when I try to do something like this `&ed25519_pk_copy == ed25519_pk`, I get this compiler error: `error: comparison between distinct pointer types 'unsigned char (*)[32]' and 'unsigned char*' lacks a cast`. which seems to suggest to me that both actually have the same value (the address of the start of the array), but the former is just a weird and unnecessary way of referencing that value. I don't see what advantage the former 18:26:36 type has over the latter 18:27:56 just a weird C thing 19:02:24 here are results of the most apple-to-apples comparison I think is possible (ref10 implementations): https://paste.debian.net/1229227/ 22:37:57 whoa this nano vanity address generator does 2 million curve25519 scalarmultbases per second on a GPU! https://github.com/PlasmaPower/nano-vanity/ 22:51:11 > Intel GPUs are not supported, as in most cases running the code on the integrated GPU is no faster than running it on the CPU. 22:51:11 related note: would seem to suggest commodity hardware integrated GPU's may not be particularly useful for verifying the chain