01:14:54 <jberman[m]> ok, first showing the speed-up of removing the extra ops in `ge25519_frombytes_negate_vartime` using what I believe is what UkoeHB intended in that snippet: https://paste.debian.net/1229132/
01:15:12 <jberman[m]> that speeds up the conversion from ed25519->curve25519 by ~85%
01:15:25 <jberman[m]> will share the view tag included results in a sec
01:22:52 <jberman[m]> actually not sure if that MSB zero'ing out is right. shouldn't it be `ed25519_pk_copy[0] &= UCHAR_MAX >> 1;` and not `ed25519_pk_copy[31] &= UCHAR_MAX >> 1;`?
01:23:30 <UkoeHB> most significant bit means the last bit
01:23:41 <UkoeHB> little endian
01:23:49 <UkoeHB> unless I am mistaken about the endianness..
01:25:32 <UkoeHB> > memcpy(&ed25519_pk_copy, &ed25519_pk, 32);
01:25:32 <UkoeHB> address-of an array pointer? don't think this is right
01:26:07 <jberman[m]> ah true, going too fast
01:28:11 <jberman[m]> I see how I can check this against libsodium too. i'll figure it out
01:57:03 <jberman[m]> you were right on both, fixed: https://paste.debian.net/1229133/
01:57:09 <jberman[m]> speed-up result is same
02:51:15 <bigslim[m]> hi guys
02:52:10 <bigslim[m]> Question for hyc or anyone that could answer regarding https://github.com/monero-project/monero/pull/4694
02:54:37 <bigslim[m]> I am unable to find any usage information for the tool. I see outputs everywhere but cannot seem to find any information on actually running it or flags needed to run specific outputs only, or if it exports full data from block 1, etc..
02:54:49 <bigslim[m]> * outputs everywhere online but cannot
03:23:43 <bigslim[m]> nm, think I figured it out. was able to export data from utility to excel
03:47:33 <hyc> that's what --help is for
03:51:28 <bigslim[m]> lol I did not even think about that
03:53:12 <bigslim[m]> you smart man you
06:18:55 <jberman[m]> knaccc: When including the view tag check, I observed a 55% speed-up from `ed25519 scalar mult -> view tag check` compared to the faster `ed25519->x25519 -> view tag check -> if match, ed25519 scalar mult`: https://paste.debian.net/1229134/
06:19:13 <jberman[m]> But there's one aspect to the results that doesn't make sense to me. It seems the view tag check slows the test down by ~5% across both types, rather than a fixed ~10ms as I would have expected. Something interesting going on in there
06:19:30 <jberman[m]> Also, here's time to hash view tag per 10,000 view tags (keccak, siphash 2-4, blake2, blake3): https://github.com/j-berman/monero/blob/b48a5a3a82ff3945380c408350e95a9b3b670b6a/tests/performance_tests/derive_view_tag.h#L217-L236
06:26:01 <jberman[m]> here's a slightly cleaner view tag check 55% speed-up paste: https://paste.debian.net/1229139/
06:56:33 <jberman[m]> also realized I left out `curve25519 scalar mulst -> view tag check`. Here it is on github: https://github.com/j-berman/monero/blob/curve25519-benchmark/tests/performance_tests/curve25519.h#L1171-L1198
08:14:17 <jberman[m]> tevador reporting that the speed-up in assembly implementations on the conversion step is not as significant: https://gist.github.com/tevador/50160d160d24cfc6c52ae02eb3d17024?permalink_comment_id=4048951#gistcomment-4048951
10:47:19 <knaccc> jberman great work, thanks! it'll be a huge disappointment if it turns out that the fastest ed25519 lib available is only 12% slower than x25519. that doesn't sound like it might be a compelling-enough improvement
10:52:44 <knaccc> jberman looks like sandy2x isn't all that much faster than amd64-64 :( https://bench.cr.yp.to/impl-scalarmult/curve25519.html
10:56:06 <jberman[m]> I also realized libsodium is already using it :/ https://github.com/jedisct1/libsodium/blob/dce3bca3bad8553fb942603f6afb8289fb418459/src/libsodium/crypto_scalarmult/curve25519/scalarmult_curve25519.c#L54-L58
11:03:07 <knaccc> jberman so then i guess the problem is that libsodium does not have the fastest ed25519 implementation?
11:04:11 <jberman[m]> ya I'm finishing up swapping out for Monero functions now and the results seem in line with tevador's conclusion :( 
11:04:34 <knaccc> jberman :'( 
11:05:06 <jberman[m]> it was a fun day
11:07:23 <knaccc> jberman i wonder then if the idea that using two different curves depending on whether you're signing or doing ecdh is just rooted in historical speed differences, and if everyone would have just stuck to ed25519 if they could have foreseen these performance improvements...
11:39:39 <jberman[m]> hmm, you're saying as in the people who in the past decided to implement the ed25519->curve conversion in their systems may not have done so had they foreseen these improvements in ed25519? are these new improvements in the grand scheme of things?
11:40:16 <knaccc> well i wonder if they would have seen a 12% improvement as worth the complexity of dealing with two different curves
11:40:23 <knaccc> which are not interchangeable
11:40:33 <knaccc> not fully interchangeable i mean
11:41:19 <knaccc> i think the argument can still be made that the world's best efforts will always be on improving varbasescalarmults for x25519 and not ed25519
11:41:48 <knaccc> so we can make a strategic choice to switch to x25519 in expectation of taking advantage of those advancements
11:42:10 <knaccc> e.g. the person that started writing GPU code to speed up scalarmults did it for x25519 and not ed25519
11:43:03 <jberman[m]> here's an article from someone briefly talking about the decision to use both: https://words.filippo.io/using-ed25519-keys-for-encryption/
11:43:12 <jberman[m]> doesn't really say much though haha, they just wanted to reuse their x25519 code and needed ed25519 for signing
11:44:03 <knaccc> did you see this link i posted earlier btw https://research.nccgroup.com/2020/09/28/faster-modular-inversion-and-legendre-symbol-and-an-x25519-speed-record/
11:44:20 <knaccc> that's an example of all of the optimization effort going into x25519 
11:44:45 <knaccc> and it'd be great to not have to constantly figure out ourselves how to backport those improvements into the ed25519 version 
11:44:54 <knaccc> with the possibility of error that would bring
11:45:57 <jberman[m]> knaccc: seems reasonable to relatively novice me
11:46:17 <knaccc> ooh
11:46:23 <knaccc> here is an interesting point:
11:46:48 <knaccc> i'm not sure, but i think x25519 scalarmults are constant time, and ed25519 scalar mults aren't
11:46:54 <knaccc> i'm not sure about the latter
11:47:05 <knaccc> but that's an interesting security consideration to avoid leaking the private view key
11:47:55 <knaccc> maybe it's possible to make ed25519 scalarmults happen in constant time
11:48:21 <knaccc> but that would be another example of us constantly putting the ed25519 round peg into the x25519 square hole
11:49:01 <knaccc> so maybe the strategic move would be to use x25519 even if it were 12% *slower* than ed25519!
11:49:32 <knaccc> just for ecdh stuff of course. not elsewhere
12:13:09 <jberman[m]> what do you think the logic was in choosing to use ed25519 from the start? x25519 was around back then too and it seems like it's all the rage all over the place for key exchange
12:15:50 <knaccc> jberman ed25519 is faster for signature verification, which speeds other things up
12:17:07 <knaccc> schnorr signatures involve fixed base scalarmult added to a variable base scalar mult, and ed25519 does that all at once faster than curve25519
12:20:44 <jberman[m]> so essentially the tradeoff in the decision was nodes can verify faster but wallets could take longer to scan
12:21:30 <knaccc> jberman yes it looks to me that way. the ring signatures are schnorr-based, and there are a ton of ec ops going on there
12:42:12 <jberman[m]> seems perfectly sensible to me. gonna head to bed. I think it's a funny coincidence the article ends saying 10% improvements are worth the implementation effort lol. from what I can gather it seems there's a solid case brewing for x25519
12:49:17 <jberman[m]> this file's the latest I've got and seems to generally line up with tevador's conclusion: https://github.com/j-berman/monero/blob/curve25519-benchmark/tests/performance_tests/curve25519.h
16:47:16 <knaccc> jberman i'm a little confused - ed25519 mult including view tag check is twice as fast as just an ed25519 mult?
17:34:23 <jberman[m]> In this version I added the equivalent check to get the output's public key in the final step. In the view tag check, it only needs to do that 1/256 times. That check is an extra ed25519 scalar mult base and addition
17:39:56 <jberman[m]> Now that I think about it doesn't make sense to include that in the curve25519 version of the test. 1 sec I'll just remove that check
17:40:34 <UkoeHB> jberman[m]: why are you taking address-of here? https://github.com/j-berman/monero/blob/366d43d3094a336296815c04e99f4e7dc94f4129/tests/performance_tests/curve25519.h#L1000
18:02:11 <jberman[m]> UkoeHB:  hm, I see what you're saying. `ed25519_pk_copy` is already an address location pointing to the start of the char array, so no need to pass the address of the address. Interesting the result on my machine is the same either way there, but adding the ref to the pointer `ed25519_pk` param changes the result
18:02:13 <UkoeHB> did some reading, c_arr decays to &c_arr when passed to a function
18:02:28 <UkoeHB> so it is equivalent (just unneeded syntax)
18:02:48 <jberman[m]> ah, got it
18:02:55 <UkoeHB> the ed25519_pk is already an address, it isn't a c array
18:03:04 <UkoeHB> a pointer to the first element *
18:03:13 <jberman[m]> right
18:05:46 <knaccc> btw on stackexchange someone pointed out that libsodium uses crypto_core_hchacha20 on the ecdh shared secret for the famous libsodium secretbox
18:05:51 <knaccc> and that's lightning fast apparently
18:06:32 <knaccc> although i'm not sure about the security guarantees where you have the possibility of two IKMs that differ by only a byte (i.e. the concatenated output index varint)
18:07:12 <knaccc> so i'd not sleep well at night unless someone really fully understands the implications of using hchacha20
18:07:39 * moneromooo makes bank by selling knaccc bulk sleeping pills
18:08:48 <knaccc> lol let's not tempt fate that a bottle of sleeping pills may become the MRL mascot :)
18:10:55 <jberman[m]> lmao ok the latest file removes the output pub key check and the unneeded syntax, back to sleep for me
18:12:25 <jberman[m]> (i could prolly use some of those too)
18:12:41 <knaccc> :)
18:26:36 <jberman[m]> UkoeHB:  when I try to do something like this `&ed25519_pk_copy == ed25519_pk`, I get this compiler error: `error: comparison between distinct pointer types 'unsigned char (*)[32]' and 'unsigned char*' lacks a cast`. which seems to suggest to me that both actually have the same value (the address of the start of the array), but the former is just a weird and unnecessary way of referencing that value. I don't see what advantage the former
18:26:36 <jberman[m]> type has over the latter
18:27:56 <UkoeHB> just a weird C thing
19:02:24 <jberman[m]> here are results of the most apple-to-apples comparison I think is possible (ref10 implementations): https://paste.debian.net/1229227/
22:37:57 <knaccc> whoa this nano vanity address generator does 2 million curve25519 scalarmultbases per second on a GPU! https://github.com/PlasmaPower/nano-vanity/
22:51:11 <jberman[m]> > Intel GPUs are not supported, as in most cases running the code on the integrated GPU is no faster than running it on the CPU.
22:51:11 <jberman[m]> related note: would seem to suggest commodity hardware integrated GPU's may not be particularly useful for verifying the chain