15:29:03 Hi there 15:53:12 Will the meeting be here? 15:53:59 FAA45: Yes, in about one hour. Welcome. 16:15:40 Possible meeting agenda item: Jamtis address tag size. Discussion starts here: https://gist.github.com/tevador/50160d160d24cfc6c52ae02eb3d17024#gistcomment-4237034 16:59:45 meeting time https://github.com/monero-project/meta/issues/720 16:59:45 1. greetings 16:59:45 hello 16:59:57 Hi 17:00:03 hi 17:00:06 Hello 17:00:13 Hi 17:00:23 hi 17:00:23 Hi 17:00:33 hi 17:02:13 2. updates, what's everyone working on? 17:03:54 Working on an X25519 implementation for Jamtis to replace ed25519 in shared secret calculations. 17:04:40 me: did some planning and initial work for integrating legacy cryptonote outputs into my seraphis lib. My goal is to provide enough utilities that a standalone wallet can be built, completely independent of wallet2. So, the new lib should be able to do balance recovery on the entire legacy chain plus spend any of those old outputs. 17:04:51 not research related, but continuing review on 8076 (hopefully will be complete today), moving on to 7999/vtnerd's alternative next, and opened a CCS 17:04:58 Working on OSPEAD. I reduced an algorithm with naive implementation computational cost of O(N^2) to approximately O(N) computational cost 😎 17:05:33 I am also putting the research computing server to the test with https://github.com/Rucknium/misc-research/tree/main/General-Blockchain-Age-of-Spent-Outputs 17:05:41 I have been trying to solve the mystery of a wrong signature validation using my Python tools and found out that Monero is performing a wrong operation for certain cases which leads to a malleability issue. Not exploitable or exploited as far I know by now. The details I will publish on my website and on github soon (tonight or tomorrow morning). :) 17:05:41 got the storage up and running (14T on SSD, 76T on HDD). reminder that if anyone needs compute resources, feel free to contact me. 17:08:26 3. discussion; today we should start by returning to the jamtis address index/tag question (whether to use 64 bits for the tag -> 56 bit address index + 8 bit MAC, or increase to something like 144 bits -> 128 bit index + 16 bit MAC); this was previously discussed in this meeting https://github.com/monero-project/meta/issues/697 and also here https://gist.github.com/tevador/50160d160d24cfc6c52ae02eb3d17024 17:10:11 it's a costs vs benefits question 17:12:04 personally, some extra bytes in outputs and addresses is worth it for peace of mind 17:12:24 my more technical arguments are in the jamtis gist 17:14:17 I guess, does anyone have any questions on that topic? Or is everyone reading? lol 17:14:40 I think the benefit of secure client-side random address generation compatible with an ubiquitous simple global standard like UUID's (128 bit index achieves this) + reducing/removing the incentive for a 3rd party scanner to collect full balance keys to offer better UX (16 bit MAC achieves this) = significant 17:15:42 UkoeHB: I'm not familiar with it yet, sorry :p 17:16:43 We need to get ISO and NIST in on this ;) 17:16:43 More seriously, I do not feel that I have the necessary knowledge base or skills to have much of an opinion on it. 17:16:55 jberman[m]: can you elaborate about the 16-bit MAC argument? 17:18:23 you mean people would give up their view-balance key to speed up their wallet sync? 17:19:27 "Does saving 5 seconds per wallet sync for a subset of Monero users justify the costs?" > I think the answer to this question is yes. I think a MyMonero-like service would emerge that takes view balance keys and users wouldn't care 17:19:47 users of that service 17:20:35 ideally the UX is as close to 0 time spent syncing the wallet as possible so that kind of service has no incentive to do it 17:21:39 An implementation of fee fingerprinting for MyMonero transactions could help us estimate the current on-chain transaction volume (if not proportion of users) of lws-like services. 17:22:24 I also think the "light wallet scanning" use case where you run a server for family and friends, without having access to amounts nor definitive knowledge of spends/receives is highly attractive. I think this use case *could* end up a larger subset of Monero users and is worth optimizing for 17:23:05 Fair points. 17:23:21 5 seconds on how much ? Half an hour ? I've not synced a wallet in ages. 17:24:24 it would be something like 5 seconds vs 0.1 seconds with the larger MAC 17:24:49 On top of how much ? 17:25:45 I mean, you can optimize a step from 5 seconds to 0 seconds, but if the other steps tkae half an hour, it's de minimis. 17:25:49 This is the whole sync time from the moment you log in to the remote service. 17:26:32 moneromooo: it's for view scanning with a remote helper like MyMonero (lws) 17:26:52 local full-scanning wouldn't be materially impacted 17:26:58 I assume you're assuming new wallets, created after jamtis would be added then ? 17:27:36 yes this is only for scanning seraphis enotes 17:28:53 I think this assumed ~250M seraphis outputs. The view tag will filter that to 1M and the 8-bit MAC to 3.9K, the 16-bit MAC to a handful. 17:28:57 it would also impact old wallets that get migrated and then get set up with a remote scanner 17:30:32 We could also do 48+16 if we really wanted a 16-bit MAC, but decided to keep the address tag at 8 bytes. 17:30:45 Does the client scan the 3.9k ? If so, it provides the client with some layer of privacy vs winnowing really really well. 17:31:16 winnowing? 17:31:25 pruning ? 17:31:29 Filtering 17:31:36 The client scans the 1M outputs. 17:32:27 3.9K outputs would pass the 8-bit MAC, so the client would have to calculate the output key and check it 17:32:56 with the 16-bit MAC, there would be almost no false positives 17:34:15 There is a separate key that I could hand over to somebody to do MAC scanning for me, right? 17:34:18 I guess the actual async time would be determined by the time to download the several MB of candidate outputs 17:34:28 So in addition to the 5 seconds or 0.1 seconds, you have to also add the download time for those million outputs too. 17:35:13 rbrunner: that's the full view key 17:35:25 you would not normally give it away 17:35:58 Ah, ok, that's too powerful then. But what is then the least powerful key I can give to any kind of third-party service? 17:36:27 the key to calculate the shared secret and the view tag 17:37:20 I see, thanks. So that will filter down to 1M in your example. 17:37:20 the "find-received key" 17:38:22 it would be roughly 100 MB 17:39:10 tevador: this CBC ciphertext stealing is an ugly mess for our use-case, can I just stick with my original idea? It's way easier to understand, read, and validate. 17:39:49 what is the problem with it? I think it just swaps the order of the last two blocks 17:40:35 yes swapping the data around makes it an ugly mess, because with my version you can do operations on the ciphertext/plaintext directly 17:40:45 so that the block that contains the MAC is the first 16 bytes of the ciphertext 17:42:27 but with this CBC cts you have to move data around in buffers... 17:42:37 It seemed like a good idea to use an existing standard for the encryption. Not rolling our own. 17:43:42 in practice everyone who looks at this does the same set of steps: understand why the method works, read the code to see if it does that 17:44:03 CBC ciphertext stealing is more complicated and harder to read, but does the same thing 17:45:13 I'd rather say "equivalent to CBC ciphertext stealing" and then anyone who cares about the standard can go look at it 17:45:30 OK 17:46:17 sweet thank you! and thanks for pointing out this standard, I had no idea it existed 17:47:52 Does something get slightly longer now with this decision? The address, because some more bytes? 17:48:02 tevador: "rbrunner: that's the full view key" -> actually you can do this with the generate address key 17:48:28 but there is not much advantage there, because MAC scanning is so cheap 17:48:40 I guess data transmission* 17:48:44 > My point was that MAC addresses work with a global 48-bit space. 17:48:45 AFAIU MAC addresses need only be unique to a local network and it's not generally expected that local networks will have millions of devices connected the next 100 years 17:48:49 there is an advantage for the download size 17:49:00 On the contrary, it's easy to envision internet businesses generating millions of addresses. The 3 bullets here make me feel uncomfortable even with a 56-bit address space: https://gist.github.com/tevador/50160d160d24cfc6c52ae02eb3d17024?permalink_comment_id=4238074#gistcomment-4238074 17:49:27 And a collision is harmful to privacy, it doesn't only have the propensity to bork a distributed system that doesn't properly account for it. The downsides seem more significant in our case 17:50:10 tevador: ah, actually download size isn't helped due to the self-send optimizatioin 17:50:18 you need all view tag matches to find selfsends 17:51:58 I am a bit lost: The proposal that is now "on the table", how many bits does that have in this regard? More than those 56 bits? 17:52:17 rbrunner: I proposed 128-bit address indices 17:52:47 rbrunner: with the 64-bit index, addresses are 180 characters and with the 144-bit index, the are slightly longer at 196 characters 17:53:09 output size is also slightly larger with the longer tag 17:53:14 That clears it up, thank you. 17:54:02 Seems to me if we accept 180 characters we can also accept 196 if it has tangible benefits. This is anyway already beyond bad :) 17:54:57 A bit unfortunate for QR codes however 17:55:40 I think there still isn't a good argument for the 16-bit MAC. Lazy users might still want to give up their view keys to avoid the 100MB data download. 17:56:51 the collision resistance of the longer tag is a clear benefit, so perhaps something like a 120+8 setup might be better even if that wastes 1 bit in the base32 encoding 17:57:21 128+8? 17:57:42 120+8, so we can use one block of the cipher 17:57:50 Ah, ok 17:57:51 UUIDs are 122 bits anyways 17:59:19 Your argument is basically "Who spends the time to download 100MB does not care too much whether scanning through that is 5s or 0.1s". Did I get this right? 17:59:33 I prefer a full 128 bits for the index to avoid imposing a 'magic number' on users. 18:00:06 rbrunner: exactly 18:00:23 UkoeHB: the index is hidden from users 18:00:28 16 bytes is a universal size, 15 bytes would be a special magic size 18:00:50 like I said in the gist, users as in whoever is trying to use the index in their application 18:01:08 they will not use the index, the API will return some other identifier 18:01:29 byte field, whatever 18:01:29 just like they will not be applying the Twofish cipher 18:01:50 the returned byte field can be 16 bytes 18:02:17 So it's down to API implementers who have to known about those 120 bytes? 18:02:24 you can stuff the 120 bits into an UUID, for example 18:02:40 and set two bits from the network flag 18:02:44 I'm saying the API user has to know how many bytes they get to define... 18:02:54 16 is a universal amount, 15 is a magic number 18:03:06 Maybe you two talk about different APIs? 18:03:14 The API user doesn't define. They ask for an address. 18:03:47 I'm talking about the deeper API of the protocol itself, not any higher layer 18:03:56 15 is the max number you can represent on a nybble in 2's complement. A very universal number. 18:04:09 :) 18:04:27 moneromooo: lol 18:05:13 Actually, scratch the 2's complement part. 18:05:21 Might be that in the real world pretty few people will ever step down so deep that they will see this 120. A handful of library implementers maybe. 18:05:30 IMHO, if 15 is advantageous compared to 16, my low-info preference is to go with 15. Satoshi was the first to use base58 after all, since it fit a specific need. 18:06:02 the only advantage here is saving 2 bytes 18:06:06 120+8 avoids the block cipher gymnastics 18:06:15 and a slightly less complicated cipher 18:06:24 Sounds good? 18:06:37 not to me lol 18:07:11 Yeah, because you are one of those unlucky few library implementers who get to see the sheer ugliness of 120 :) 18:07:45 well so far I have avoided making ugly decisions, after 35k+ lines of code, and I certainly don't want to start... 18:07:46 I'm still not convinced we are not leaking information by using ECB on 2 overlapping blocks. 18:08:12 even if the first block covers the whole secret index 18:09:00 tevador: it's not ECB, it's an overlapping CBC 18:09:12 ECB doesn't have any XORing 18:09:20 it's CBC with an IV of all zeroes 18:11:54 btw 120 bits is not ugly, I'm using it for the hash identifier here since it's divisible by 5 bits for easy base32 encoding: https://github.com/tevador/id32 18:12:08 Maybe a stupid question, but what happens if somebody cracks that encryption and gets to see those bits in clear? How bad would that be? 18:12:29 it's ugly for a generic bytefield that anyone can freely define 18:12:38 Seraphis is currently 35k+ lines of code? Impressive, koe 18:12:54 Rucknium[m]: well that's the diff on my branch vs master 18:13:34 rbrunner: you learn the address index that's all - it depends on how that index was defined, whether it means something 18:14:20 how about an UUID that has 122 bits you can define? :P 18:14:40 Ok. But of course still bad if somebody would catch to leak some info here. And the obligatory "Monero is finally cracked" articles ... 18:14:50 *would catch us 18:15:00 tevador: the overlapping cipher is similar to ciphering the same block twice in a row; if that could leak information, then I don't see how ciphering would be secure 18:15:18 The ease to develop: generate UUID -> address index, sounds very nice to me as well fwiw 18:16:40 But for that we would need the full 128 bits, right? 18:17:28 because 122 18:18:44 I should say, the ease to develop: generate UUID v4 -> address index... 18:19:39 128 bits would have the advantage of supporting all versions of UUIDs 18:19:46 Hmm, maybe a bit overkill if you could as well just randomly choose those bits? 18:20:01 Or I don't get fully get your idea 18:20:07 *yet 18:20:18 "128 bits would have the advantage of supporting all versions of UUIDs" -> true 18:24:14 Compatibility with a UUID system enables e.g. a merchant system to use UUID's to identify orders in their system and then use as input to generate addresses. It's more of a compatibility/standardized integration thing. Sure people could just use random bytes instead of UUID's in all cases, but UUID's are an ubiquitous identifier standard people opt for instead 18:24:57 I see 18:25:36 Also has trust on its side 18:25:44 As some sort of psychological benefit 18:26:09 And the DB would already complain if there was a collision, right? 18:26:54 But well, it would do that also with the corresponding number of random bytes ... 18:29:18 We are well past the hour on the meeting, so I think I'll call it here. The subject of jamtis address tags hasn't found a conclusion, but at least it's quite easy to change the implementation if needed. 18:29:22 Thanks for attending everyone 18:30:28 "I think there still isn't a good..." <- 1M view tag matched outputs = 256M global outputs. After 5.5 years of RingCT, we have ~58M global outputs 18:30:55 Bandwidth will likely improve by the time users will need to sit and wait to scan 1M outputs locally with any sort of frequency, at which point the benefit of a larger MAC would presumably be more significant. But, worth looking deeper at bandwidth trends. Perhaps this line of reasoning is moot and we should always expect bandwidth to be the bottleneck by a wide margin 18:31:15 UkoeHB: When you post the meeting logs to GitHub, is there an easy way to make the text wrap? It's hard to read otherwise. I know I can copy-paste etc., but I also post links to logs sometimes to answer community questions. 18:32:35 You should probably get the average outputs per unit of time within the last month or so, then scale, instead of using outputs since the start of the chain, which includes the desert at the start. 18:32:52 True 18:33:45 Wait. 5.5 years of ringct. You did not include the desert. I'm kinda amazed rct is already 5.5 years... 18:40:47 Rucknium[m]: uh doesn't seem like it, your best bet would be plowsof's script https://github.com/plowsof/post-libera-meeting-logs (I haven't been using because the logs I post only take me 30s or so) 19:05:33 what's the link to the chat archive? 19:10:01 FAA23: https://libera.monerologs.net/monero-research-lab/20220720 19:46:40 "the overlapping cipher is similar to ciphering the same block twice in a row" > yes, and that leaks information. If we had a 16-byte MAC, you would leak a "plaintext" and the corresponding ciphertext: it's called a known plaintext attack. 19:47:00 it's probably now enough for a meaningful attack, but the leak is there 19:47:07 not enough* 19:48:31 sorry, what information is leaked exactly? 19:51:58 You leak x and enc(x). Given enough of these, some attacks on ciphers are possible. 19:52:14 Even if x itself is not meaningful 19:52:29 you leak the MAC? I'm not following 19:53:11 if we have a 16-byte index and 16-byte mac, the tag would be enc(index), enc(enc(index)) 19:53:18 x = enc(index), you leak x and enc(x) 19:54:04 with the overlapping tag, the amount of information is obviously much less than 1 block 19:55:58 are you claiming enc(enc(x)) == dec(enc(x))? 19:56:14 looking at the implementation, that doesn't seem to be right 19:56:53 no, I'm claiming that if an attacker has access to many pairs of [plaintext,ciphertext] with the same key, some attacks are possible 19:57:40 https://en.wikipedia.org/wiki/Differential_cryptanalysis 20:02:18 https://en.wikipedia.org/wiki/Twofish "The paper claims that the probability of truncated differentials is 2^−57.3 per block and that it will take roughly 2^51 chosen plaintexts (32 petabytes worth of data) to find a good pair of truncated differentials." 20:02:30 I think this could be fixed if the MAC was not simply zeroes, but some hash value calculated over the index. 20:02:40 then you would not leak anything 20:03:15 maybe siphash? 20:04:12 the information leak sounds like speculation - is there an actual known attack on twofish that can do this with a reasonable number of known plaintexts? 20:04:55 iirc siphash is much more expensive than twofish 20:05:19 No, I'm talking about a general attack. No idea if there are known attaks against a particular cipher. 20:05:46 in crypto, you generally take the cautious approach 20:10:56 I mean, in the wiki article you link it says "The AES non-linear function has a maximum differential probability of 4/256 (most entries however are either 0 or 2). Meaning that in theory one could determine the key with half as much work as brute force, however, the high branch of AES prevents any high probability trails from existing over multiple rounds". So differential analysis of AES is only 2x as good as brute force... 20:10:56 Reading that (and supposing we were using AES instead of twofish here), would you then say 'well be better be cautious and include some random data'? 20:11:00 btw siphash-24 takes about 45 cycles for 16 bytes 20:16:18 so what is the worst that can happen if we make the MAC siphash13(index)? 20:16:54 that's about extra 10ns per output 20:17:02 in that case you have to decipher both blocks to check the mac 20:20:35 then siphash just 8 bytes of the index 20:21:03 no can do, the first block decipher only gives you the MAC plus 14 bytes of the first block's ciphertext 20:22:36 actually, there is no reason we couldn't siphash 8 bytes of the encrypted index 20:22:37 although I guess you could siphash some of those bytes 20:23:32 but if you're going that far, why not just duplicate some of the encrypted index bytes? 20:23:38 what's the difference? 20:25:10 then you have a plaintext with a known relation or possibly a partially known plaintext 20:27:45 Anyways, it's just an idea. I'm just not feeling comfortable about the encryption process for the tag. The 64-bit tag was straightforward. 20:40:20 To be clear then: you want to avoid a hypothetical differential analysis on a decades-old cipher that can use a 2-out-of-16 bytes known plaintext to extract bits of the cipher key (presumably more than 1 or 2 bits, and using on the order of only 2^20 unique ciphertexts)? If such an attack were possible, then twofish would be incredibly broken for 16-byte known plaintexts, which seems very unlikely. 20:43:41 We had a similar discussion about information leakage in PR 8061 and in the end, the most overkill solution was selected. 20:44:14 I think this will definitely come up in a future review. 20:45:29 doesn't mean the solution was right 20:56:58 I have updated my findings about the malleability issue 20:57:07 https://github.com/monero-project/monero/issues/8438 20:59:12 It was pretty shocking to me in the beggining of the week. I was kind of panicking but now I am getting more used to the idea that it is just a huge bug IMHO :p 21:00:24 I would like to thank moneromoo, koe and luigi for their support in trying to figure this out together. 21:00:58 I think I importunated them quite a lot Monday :p 21:02:57 And let me know your crypto thoughts about that ;) 21:11:14 can we prove that no coins were created in those outputs? 21:14:43 Thats the question :) 21:14:53 What do you think? I would love to hear. 21:16:05 the attacker's wiggle room would be just 1 bit, so I think it's safe 21:16:46 It is kinda my feeling also now 21:17:14 But living with the doubt is pretty bad :p 21:23:23 dangerousfreedom: you should include the code I sent which demonstrates the real scalar can be reconstructed 21:24:24 the issue is effectively a serialization mis-representation of proof scalars that are non-reduced, which is definitely nothing close to an exploit 21:26:45 UkoeHB: Can you put a detailed answer on github? I would be happy to check exactly the part of the wrong code. And I also think it would be useful for educational purposes and for the blockchain sanity. 21:29:57 Can I once again advocate for mandated canonical encodings of all ECC values at a fundamental level? 21:30:31 I do hear this is solely a serialization issue and I'm fine with that. I'm now frustrated we now have 3 different scalar deserialize behaviors 21:30:44 Strict, reduce, and whatever algorithm this did end up applying 21:31:28 *I also ack this is scalar, not point, and my general ristretto advocacy is point related. 21:31:55 kayabanerve[m]: scalar reduction is already enforced for everything post-2016 21:32:06 it's just this Borromean proof that was implemented very naively... 21:32:31 👍 but I'm still using this as ristretto propaganda :p 21:33:06 I more meant to demonstrate we wouldn't leave the opportunity to have this done naively. While we're not naive anymore, my advocacy remains removing the opportunity, especially due to performance benefits 21:33:16 *but yes, at this point, it's my job to submit the pr, I know* 21:36:01 I totally agree with kayabanerve. Using LibSodium or Ristretto would not even allow me to get these errors as it is encoded at the fundamental level. I feel like my dirty Python tools are now more reliable then Monero (it is a joke :p) 21:36:13 dangerousfreedom: I believe the comment is that instead of x, or x % l, you should be able to define a invalidly_assume_normalized which transforms the scalar to how it was interpreted. If you have the link to koe's code, I may be able to take a stab 21:36:25 I literally have an issue in my work saying I need three different transaction types 21:36:55 One for canonical transactions, one for verifiable transactions, and one for wallet transactions 21:37:42 When building wallet code, I decided not to deal with 'malformed' txs and just reject them. The issue is monero is full of them :/ 21:38:26 So I could merge canonical + verifiable, but them I'm applying a bunch of checks and deserializations on proof data I don't need just for wallet functionality 21:38:45 I totally see. 21:38:57 And I need to be sure I'm right if I move from bytes to actual ecc types. As shown here, we have three different scalar deser algorithms .-. 21:39:21 So I understand this is historical, we haven't been hacked, and we'll survive. I'm trying to highlight the value in rigidity and simplicity 21:39:44 kayabanerve[m]: Cant agree more. 21:40:11 dangerousfreedom: want to move your python to rust 👀 21:40:37 :p I'll try to poach you later. PM me a link to koe's snippet and I'll try to help you when I get back home in ~25, if you need it ;) 21:40:51 But sounds like you'll be able to handle it fine tbh :) 21:41:35 kayabanerve[m]: I was actually talking to a very close friend that is much better in programming than me and he was very excited about it and I think it would be good for us if we can do it :) I will message you later 22:02:56 tevador: on my machine it is about 105ns to do one twofish block decipher, and 40ns to do one halfsiphash hash 22:31:48 ah actually more like 95ns for the twofish decipher, there seems to be a 20ns overhead from other stuff during the address tag decipher