15:03:09 meeting 2hr 16:17:21 Sorry everybody, I'm a rookie and maybe with this post I'm making more noise than necessary, but I wanted to signal this (just seen): 16:17:22 https://blog.trailofbits.com/2022/04/13/part-1-coordinated-disclosure-of-vulnerabilities-affecting-girault-bulletproofs-and-plonk/ 16:17:22 (btw, cross-posting to monero-dev) 16:24:12 correction: not cross-posted, 'cause I have seen @jrg was 1 hour ahead of me ;) 16:26:22 beat me to it as well though 16:33:04 I hear that it doesn't affect Monero though 16:33:46 😅 16:58:01 baro77[m]: https://nttr.stream/benediktbuenz/status/1514270574557769730 16:58:01 Luckily Monero/Ristretto/zkcrypto/libsecp256k1 had it right from the beginning and were not affected. This shows how error-prone implementing Fiat-Shamir is and it's best to use a library like Merlin github.com/zkcrypto/merlin 16:59:34 👍 I showed my rookie-ness ;) 17:00:00 meeting time https://github.com/monero-project/meta/issues/688 17:00:00 1. greetings 17:00:00 hello 17:00:16 👍 17:00:21 Hi 17:00:41 Hello 17:01:12 hey 17:02:53 howdy 17:02:55 2. in case anyone missed it, we can get this out of the way https://blog.trailofbits.com/2022/04/13/part-1-coordinated-disclosure-of-vulnerabilities-affecting-girault-bulletproofs-and-plonk/ 17:03:05 short story short: does not affect monero 17:03:58 any questions on that disclosure? 17:04:58 Not being as familiar with bulletproofs, why exactly does it not affect Monero? 17:05:37 Monero's was written by Sarang, who was careful. 17:08:19 The blog post claims: 17:08:19 >Why is this type of vulnerability so widespread? It really comes down to a combination of ambiguous descriptions in academic papers and a general lack of guidance around these protocols. 17:08:51 Is it true that there is this shortcoming in the cryptography space? 17:09:27 xmr-ack[m]: The issue was the paper did not have a complete implementation guideline, so anyone going directly off the paper would have made a certain small error in the Fiat-Shamir transform (neglecting to hash some public variables). Monero's implementation followed best practices for FS transforms, which prevented that issue. 17:11:56 Rucknium[m]: yes, best practices for implementing cryptographic protocols are not well-disseminated among general developers 17:13:52 Also, afaik most crypto professionals aren't implementing advanced algorithms, they are dealing with data encryption and encrypted channels. It's mainly in the cryptocurrency and experimental crypto spaces that these big protocols are implemented. 17:14:28 UkoeHB: Does it list the projects that are affected somewhere? 17:14:40 yes it's in the blog post 17:15:50 Most crypto professionals aren't rolling their own crypto from scratch *. They are following some highly-scrutizined specification. 17:16:31 Anyway, we can move on 17:16:55 3. updates, what is everyone working on? monerotopia was last week, it was great to meet several people 17:18:37 Aside from monerotopia which was sweet, been working on PR review (my list atm: 8046 [done], 7760 [in progress], 8076 [in progress]), digging into a sporadic bug with tx broadcast (8251, 6929), and submitted an updated proposal for subaddresses in light wallet servers in collab with endogenic here: https://github.com/monero-project/meta/pull/647 17:19:24 me: I'm continuing work on my seraphis library. I realized getting an accurate minimum fee oracle will be a huge pita, so I'll probably just make a trivial mockup and move on... 17:19:26 Working with jberman and mj-xmr on translating some C++ to statistics language. And doing some prototyping of statistical models with some simulated data. 17:21:56 Im writing some code and text to understand and explain mlsag and borromean ring signatures. I think I will have something to present by the end of the month. 17:22:59 3. discussion: any topics to discuss? questions/comments? 17:23:52 UkoeHB: Thanks 17:23:59 Me: Unfortunately, I couldn't attend Monerotopia :(. But I do have some updates in regards to my work with MAGIC. I have a working implementation of two rudimentary machine learning models and have been training them on a preliminary dataset. I want to clarify that the preliminary dataset has many flaws... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/581aa6186c27f11139c90f09926efa9e71009791) 17:24:30 * Me: Unfortunately, I couldn't attend Monerotopia :(. But I do have some updates in regards to my work with MAGIC. I have a working implementation of two rudimentary machine learning models and have been training them on a preliminary dataset. I want to clarify that the preliminary dataset has many flaws and that most of the transactions were collected with an (unrealistic) 20-minute uniform delay. I am currently refining the 17:24:30 process of collecting a new dataset that does reflect real user spending patterns. 17:24:30 That being said, after training on an undersampled dataset removing the possibility of a guess newest heuristic, I am seeing accuracies upwards of three times random guessing classifying the true spend of an arbitrary ring with no outside information. 17:24:30 Metrics: 17:24:30 Random Forest Testing Accuracy: 23.41% 17:24:30 Random Forest Weighted F-1 Score: 19% 17:24:30 Gradient Boosted Classifier Testing Accuracy: 27.40% 17:24:31 Gradient Boosted Classifier Weighted F-1 Score: 26% 17:24:31 Feature Importance (Top 5): 17:24:32 Inputs.1.Time_Delta_From_Newest_Ring_To_Block: 13.07% 17:24:32 Total_Block_Tx_Fees: 10.97% 17:24:33 Inputs.0.Time_Deltas_Between_Ring_Members.9_10: 4.98% 17:25:46 * xmr-ack[m] uploaded an image: (112KiB) < https://libera.ems.host/_matrix/media/r0/download/matrix.org/dqAGJKuAviBOFlLETUwqOVcZ/image.png > 17:27:57 My commentary (I've seen these results prior to now): This underlines again that when there is a substantial mismatch between the decoy selection algorithm and the real spend age distribution, statistical methods can be used to narrow down which ring member is the real spend. 17:28:52 And here there was a large mismatch, since the "real user behavior" in this synthetic data was uniform over just 20 minutes. 17:28:53 Thanks xmr-ack[m] 17:30:34 Also the feature importance is interesting, but it is hard to know if there is any real problem with some of these features without some sort of confidence interval or p-value on these estimated feature importance results. Since the feature importance may be nonzero just due to sampling error. 17:31:56 On a different topic, I want to point out that Monero multisig is capped at 16 group members for efficiency reasons during account generation. I was assuming people who want to use multisig would see that limit in the code, but maybe not... (looking at Thorchain). A different approach to key generation (see the FROST paper) is much more efficient when M < N - 1 (enabling larger groups generically), but my focus was on 17:31:56 N-of-N and (N-1)-of-N for efficient escrow. I will probably implement a FROSTy key generation backend for multisig in my seraphis branch (but probably won't port that to master, since current multisig uses round-robin signing but you need aggregation signing for FROST - which I have in my seraphis branch). 17:32:00 In other words, if there is some feature that the machine learning algorithm identifies as helping classify the real spend, then it would make sense to investigate further and see if there is some sort of problem with the code or the protocol. 17:33:35 Rucknium[m]: I think most features that can affect distinguishability can be manually identified, although it would be helpful to get a sense of their significance in practice. 17:35:10 isthmus and neptune have done a bunch of research into distinguishable features (numerical counts of different fingerprints), but did not extend that to transaction tracing 17:35:21 One problem is that ground truth is kinda biased due to self selection (I forget the exact term). 17:36:03 ie, the rings you know the real spend for will not be randomly selected from the set of all rings. 17:36:24 So using this as a proxy for ground truth will bias you. 17:36:46 I think. Not a statistician. etc. 17:37:22 makes sense to me 17:38:26 The more I think we need to look for the ground truth in what the code generates and simulate the situations, rather than only relying on past data. 17:39:01 (in search of ground truth, I mean) 17:39:56 I want to implement binning in seraphis this week. There are two choices: A) completely deterministic binning (selection distribution is baked into consensus), B) partially deterministic binning (only bin members are deterministically distributed around their bin locus, but each bin's locus is manually specified). 17:39:59 seems a +1 for binning to me too, because the gamma AFAIU is pulled from just a subset of rings in which the real could be identified, rather than all rings. binning mitigates weakness in an incorrect gamma 17:42:10 reasoning for B was to avoid tying the algo to consensus to allow changes without a hard fork right? 17:42:19 UkoeHB: To go back to the manual identification vs. machine learning identification: the 5th most important feature, according to these preliminary results, is transaction size. I don't know of any obvious issue with transaction size that would help identify the real spend, so this ML approach can extend our knowledge of transaction uniformity defects beyond what was previously theorized by isthmus, neptune, and others. 17:42:27 The advantages of (A): less data in a tx (maybe 30-80 bytes), less room for implementation variance that can open anonymity puddles. 17:42:27 The advantages of (B): we aren't stuck with a fixed selection distribution, it's a lot easier to implement. 17:43:31 I can think of one reason tx size identifies the real spend: 17:44:23 tx size can be decomposed to input/output count + tx extra fields. 17:44:33 A common reason for large txes is many input txes, which are typically made to consolidate small outputs, like dust etc. These are usually old and you consolidate once you have to after spending the rest. 17:44:56 Another is pool payments, paying to many people. High churn so typically new outputs. 17:45:25 First case will have two outputs, first ring member. Second case, more than two outs, last or close to last ring member. 17:46:03 * xmr-ack[m] sent a code block: https://libera.ems.host/_matrix/media/r0/download/libera.chat/7e23c0d3faf278d7169c019a2d3614221d8e5a9b 17:46:04 s/many intput txes/many inputs/ 17:46:14 UkoeHB: How does deterministic ring member selection affect tx verification time? Also, I have been trying to understand your deterministic ring selection proposal on github, but I feel that I'm missing something. Is there some cryptographic knowledge I need to understand it? 17:46:26 BTW, if you write too long lines, your message gets omitted, and replaced by a url. 17:46:43 there's no cryptography other than a hash function 17:48:01 it shouldn't have a noticeable effect on tx verification time 17:50:13 Last time we talked about enforcement of decoy selection, there was enthusiasm for enforcement at the node level rather than the consensus level. Could Seraphis's deterministic decoy selection be written in a way that `monerod` checks it in the mempool, but does not declare blocks invalid if there are txs mined that do not conform? 17:51:30 You could do that, but I feel it is a bit of scope creep for nodes 17:52:48 "UkoeHB: How does deterministic..." <- The mapping probability to cdf to output index was the most difficult part understanding it for me. I wrote up some code that made it click for me a while back that might help, will share once I find it Rucknium 17:53:28 This paper is super helpful: https://spar.isi.jhu.edu/~mgreen/mixing.pdf , and a presentation on it was given that helped me too that I can't find right now 17:54:44 jberman: Mainly I don't understand how the real spend is made indistinguishable from the deterministically-selected decoys. Trying to understand that. 17:56:46 jberman[m]: damn I lost that presentation... 17:57:56 The obvious method is to include an offset, selected to that one of the deterministic picks matches the real spend. 17:58:09 selected so that... 17:58:15 Rucknium[m]: you generate a random set, then shift that set so a random member lands on the real spend 18:00:19 Ah ok. Great. Might that give you gaps in the beginning or end of the distribution, though? 18:00:38 why? 18:00:44 The offset will tend to be large for old spends. 18:01:04 And it seems more feasible with Seraphis-sized ring member counts rather than 11 or 16 18:01:16 you transform the random set into a uniform space before rotating, then transform back into the real selection space 18:01:18 Though I suppose that, depending on the generation algorithm, you might be able to brute force one with an old spend and small offset... 18:01:58 we are at the end of the meeting, thanks for attending everyone 18:04:08 High level summary of what I did in my code sample that explains how it can mesh with the gamma: assume 16 ring members. Use a random seed to generate 16 real numbers between 0-100. Those values are the starting spots in the gamma CDF that we will offset in a later step. Take the real output, map that into its spot in the CDF. Take 1 spot from the original 16, offset it to match the real's CDF, and then offset all other spots by 18:04:08 the same amount. Then all you need to store is the offset and the random seed, and you can deterministically find all 16 ring members 18:04:48 there are floating point calculations tho, which I don't know how to avoid 18:06:47 Maybe boost multiprecision stuff ? 256 bit numbers. Can be used for fixed point. 18:07:11 And you don't care about imprecision, as long as it's deterministic. 18:13:43 seems reasonable to me. also need to make sure that every output in the chain can plausibly be referenced as a decoy, just gotta keep that in mind/double check when implementing 18:16:10 i don't know if my code sample guaranteed that/what bit precision in the real might be necessary to guarantee that 18:16:45 jberman[m]: I'll try to implement the fully deterministic version, and if it seems too difficult to get right I'll just give up and go with partially deterministic. 18:17:24 One additional complication is you want a gamma over block heights, rather than over output indices.. 18:18:22 I strongly suspect that if imprecision means some outputs can't be selected, changing height will change the selectable part and yo'll get full coverage. Though you do bring up an excellent point that it does alter the probability distribution... 18:24:23 "One additional complication is..." <- That PoC (MRL #88) was for pure client-side + assuming unlock times exist. I think you're good to stick with output indices in a deterministic approach + if we assume unlock times are gone or hidden 18:26:01 but, what we do today where we first gamma select an output from across the chain, then uniform random select an output from the block that gamma selected output is in.. that would be complicated to replicate I think 18:30:26 jberman[m]: I think we could cut out the part about randomly selecting in the block (not sure why we do that). 18:31:44 IIRC, it was to counter some bias due to high variance in the number of outputs per block... 18:32:45 Ah, yes: the Miller et al paper had a time distribution. Time is block (if you squint just right). Output based will bias as average output count changes increases over time. 18:35:53 https://github.com/monero-project/monero/pull/5389#discussion_r271668970 18:38:03 That's not what I said above, and I think what I said above is way more significant... Hmm. Maybe I did not understnad what sarang said at the time... 18:40:30 ah, if txs in a block are randomly ordered then the tx members don't directly map to a timing distribution, ok 18:43:34 moneromooo: I think what you were saying above is rationale for why to use the "output lineup" method in general, rather than point to blocks 18:43:56 and the comment in that PR is an explanation for why to do that single last step in the algo where it randomly selects an output from the block 18:44:11 after gamma selecting an output from across the chain using the "output lineup" method 18:52:38 Rucknium UkoeHB here's the presentation on that paper: https://alishahc.com/slides/mixing_pres.pdf 18:54:09 jberman[m]: nice find! google totally failed me 18:55:55 found it on the author Alishah Chator's professional website haha