00:09:27 I'm glad that people enjoyed the article :- ) 00:09:27 Please note that there have been two minor revisions since initial publication (though neither impacting the overall conclusion) 00:09:27 1. Clarified some imprecise language, where I said “churn” when what I meant was to `transfer` repeatedly to wallet(s) controlled by yourself. 00:09:27 2. The ability to ascertain that a transaction doesn’t have any subaddress recipients based on the absence of additional keys does not apply to 2-output transactions (only to 3+ outputs). So the comments about primary and subaddresses have been removed. h/t @UkoeHB for catching this 00:09:27 Let me know if you catch anything else, or have ideas for extensions. 04:56:09 Has anyone on the Monero side of things considered an openly developed chain analysis software? Being able to determine how far software can go, and what patterns are effective, would be a valuable piece of knowledge for XMR as a whole. 04:56:44 The one concern I'd have with such a project would be the inability to keep up if Monero successfully evolves its technology, while I'm notably thinking of Triptych. 04:57:50 But the software is out there to some degree by some people. A public form, which at best serves as proof of the software's impracticality as well as red team research, can't really make the situation worse. 04:58:15 (standard disclosure policy on issues would still be followed, of course) 05:04:04 kayabaNerve: I am currently red-teaming Monero. And BCH CashFusion. Everyone is getting the red team treatment :) 05:04:31 Rucknium[m]: Have any information on that/published work? 05:04:46 I did hear about your CashFusion work which we've also discussed :P 05:04:49 Open development of chain analysis software for Monero would seem too risky. What if we stumble upon something truly harmful? 05:05:02 "standard disclosure policy" 05:05:21 Would you rather we never do and someone else does? 05:05:23 kayabaNerve: I recently submitted a 28-page document to Monero's Vulnerability Response Process via HackerOne 05:05:40 Oh, the recent mixin selection commentary? 05:06:15 Tbc, I'm not saying I've read, or even have access to (I do not), the doc. I just remember that event coming up a few weeks ago 05:06:23 You can check the #monero-dev logs from 24-48 hours ago for some limited open discussion about it. 05:06:50 It involves the mixin selection algorithm, yes. 05:07:39 Thanks for the heads up. I do stand by believing open analysis tools would be beneficial. The sole question is it worth the benefit, given the difficulty of creating one. 05:08:17 "What if we stumble upon something truly harmful?", you say as you have already stumbled upon something worth improving and accordingly improved Monero :P 05:08:32 kayabaNerve: I don't think it would be too difficult. 05:09:05 kayabaNerve: Right, but if it happened in the open, Monero's adversaries would already have access to it. Instead, I am working on a "patch" 05:09:43 For basic commentary? Surely not. For multi-TX analysis and getting close to the best accuracy possible? Definitely. 05:10:03 Except Monero's adversaries already have access. It's we who don't. 05:10:41 They have access to knowledge that hasn't necessarily been created yet? 05:10:47 That's my stance. We know one company actively working on such software, and I'd presume 1-2 more. Combined with the... IRS was it? bounty and them selling access... 05:11:34 They have access to internal research on Monero's privacy, a direct profit incentive never to report it, and tooling created via their research which allowed practical refinement and continuation of their research. 05:12:33 We have some level of research, obviously, yet we haven't taken the next step into practical application and confirming hypothesis of de-anonymizing XMR, in order to better understand viable attacks and optimal pattern analysis. If we knew optimal pattern analysis, we could more efficiently design a new pattern. 05:14:09 Right now, we're left discussing theory and paper, with no consideration for the practicality of how XMR is used. I know I've made TXs within 30 minutes of each other as I've moved funds. Papers don't cover such real world aspects in regards to what percentage of TXs exhibit sloppy behavior AND what percentages can be calculated from that (except perhaps basic commentary on "these are bad"). 05:14:25 Any paper which does would likely have implemented chain analysis scripts to provide their extended commentary. 05:14:34 And what's that? Chain analysis software? :O 05:14:56 kayabaNerve: Let's agree to disagree about the open development part. I'm working on what you're talking about, if I understand you correctly: 05:14:56 https://github.com/monero-project/research-lab/issues/86#issuecomment-905800761 05:15:11 I will submit a CCS proposal to execute it within the next few days 05:15:26 It wouldn't necessarily have to be open. It's just a founding principle of OSS and the best way to get participants and review. 05:16:08 Tbc, I'm suggesting you publicize your findings... post-patch 05:16:13 Same disclosure policy. 05:16:50 Like you're saying you want to keep this private until researched and improved. Great. I support that. 05:17:37 The problem with post-patch disclosure is that the blockchain is forever. Past txs remain vulnerable. in the recent discussion in #monero-dev , selsta suggested that my attack should not be published. I agree with selsta on this one. 05:18:02 I'm not suggesting we just push to Git all new functions and tuning parameters. I'm suggesting a CA tool with established theory for use as a branching off point by teams for new attacks. If they achieve incremental improvements with no fix immediately available, it's submitted. If they start getting incredibly high success rates, that's a major issue, and the relevant channels occur. Post-patch, the tooling is published, and the 05:18:03 cycle begins again. 05:18:19 The problem with never disclosure is no one can learn from what happened and it'll just happen again. 05:18:43 Combined with the fact blockchains suck for privacy and we need to accept that, not run away from it. Quantum computers will destroy all of this anyways, no? 05:18:57 So we're already discussing a... 20 year timeline until we lose all privacy? 05:18:57 I am not sure. This is really my first foray into white hat hacking, I suppose 05:20:10 And if we adopt a quantum secure privacy algorithm, which I'd immensely love, we're still faced with entropy issues IIRC. 05:20:41 So all of this has a timeline. The question is do we accelerate the timeline in order to ensure everyone can learn from it to do better, or we do effectively lie to users 05:21:48 If there's legitimately a security issue you have, right now, that's extremely effective for CA purposes, that extent of it must be disclosed in order to maintain a fair community. While I agree that's distinct from how, I don't believe anyone as large as ChainAnalysis would have an issue reviewing every single change on Git and picking it up. From there, it's not really a secret. More just something that isn't talked about often. 05:21:48 This will lead to a lack of education and application 05:21:55 I think if and when a patch is developed and deployed, and we fully understand everything, it could be appropriate to release a general advisory to users about what types of past transactions may be vulnerable. 05:21:56 That's my stance, anyways. Will read up in #monero-dev 05:23:23 kayabaNerve: Look, these decisions are "above my paygrade". I am inexperienced in all of this. 05:23:26 As one other note, doesn't mixing selection by choosing 10 random mixins and walking away, without context on where the real mixin will be comparatively? 05:24:05 Rucknium[m]: This is a public channel and I'm laying out my thoughts not only to try to sway you to some degree, yet in case anyone else read through this and has something to comment. 05:24:17 Could you rephrase your question? 05:25:19 When selecting the 11 ring members, doesn't XMR: 1) Take the real mixin. 2) Select 10 random mixins after just being told to grab 10 random mixins, without any idea where the real one is. 3) Combine the two. 05:26:07 I do understand the actual process of selecting those 10 mixins is complicated. I'm commenting it doesn't use any context of what the real mixin is. 05:26:16 Unless I forget/misread the relevant code. 05:26:59 Yes. But the read spend and the mixins have, in a sense, metadata. One element of this metadata is how hold each input is. The flawed mixin selection algorithm from pre-2018 was exploited by the analysis of Moser et al. (2018) An Empirical Analysis of Traceability on the Monero Blockchain 05:26:59 Or a title to that effect 05:27:20 ahem, "how old each input is" 05:27:29 ... that's my comment. 05:27:37 I got "hold" on the brain, apparently. 05:28:02 I was going to say an extremely simple comment/question of "Why don't we select 11 mixins randomly, substituting the real one for the closest fake one?" 05:28:40 On the topic of quantum secure privacy algorithms, 05:28:40 They are selected randomly. 05:28:56 ... 11 05:29:03 Not 10. 10 are selected randomly. 05:29:28 Ok, imagine a simple thought experiment. And this thought experiment isn't too different from how Monero used to work in the bad old days: 05:30:12 Mixins are selected. Every output on the blockchain since the genesis block is eligible, with equal probability, to being selected.... 05:30:53 Now, typical users actually spend their received coins within, say, a week of receiving them. In that case, an attacker could guess that the real spend was just the most recent ring member. 05:31:03 That's it. That's the problem in a nutshell 05:31:23 If I randomly select a mixin from the past month, yet have my real TX in the last month, there's now two outputs from the last month. If that's statistically unlikely using the standing mixin selection algorithm, then it's a giant red flag the real mixin is one of those two. IIRC, the current mixin algorithm has no context on the real output. I'm asking, both as a naive comment and trying to learn, if randomly selecting all 11 and 05:31:23 Since the "fake" mixins would all be much older than the real spent output 05:31:23 then substituting the real one for the closest fake one would help or not. That said, I'm truly not a statistician and can't comment on current distribution/selection, so this may just be very over my head, leaving me to discuss red team theory. 05:31:36 I know this. 05:31:40 I promise you, I do know this. 05:32:46 I don't know how the current algorithm works, as it's been ages since I've read up on it (and even then decided it was too complicated to impl for my own work before performing a naive selection algorithm). I do know what it looks like to send up such a signal flare though. 05:33:13 I also do remember the mixins must have an average age < X, which was healthy to see. 05:33:30 So you're saying that some info on the real spend should be used to select the mixins? My intuition is that would lead to an attacker having greater probability of detecting the real spend, since the mixins would in some ways have a pattern linked wit the real spend. 05:33:59 Yes and no. Selection wouldn't be changed at all. The amount selected would be. 05:34:47 Then, if there's something randomly within the last 30 minutes (already statistically likely to be the real send), there's not two entries from within the last 30 minutes (almost definitely one of those two). There's just one, as it replaces its closest randomly selected entry. 05:35:12 It was a probably naive idea I wanted to throw out there as I found it odd it wasn't already done that way. 05:35:29 Because seriously, I am not the person to do any statistical analysis here. I trust you for that if I ever need it ;) 05:36:32 But I did want to discuss red team management and I am interested in learning more about the theory, which I have a basic understanding of. That's why my naive suggestion is also a question. 05:37:16 Because you absolutely may be right that changing a randomly selected item, used in the context-dependent calculation of all randomly selected items, with a fixed position after the fact may reduce the security of the randomly selected items. I'd love to hear about that :) 05:37:35 jberman is working on something experimental that may be somewhat close to what you are saying: 05:37:35 https://github.com/monero-project/research-lab/issues/86#issuecomment-921805298 05:37:42 And I'd trust someone else to do the stat analysis and decide if it's more or less secure than the risk of getting 2 outputs within the last X amount of time :p 05:38:52 I did see that, and yeah, it's basically what I'm discussing. Minimizing overlap between random and actual. I just described an algorithm which wouldn't need to be run twice in my comment 05:39:03 So sounds like if I want to learn more, that's THE issue. Thanks :) 05:41:31 kayabaNerve: So the issue with your idea, as stated, is that...I'm finding it hard to explain. Basically, there is equal probability, in some sense, that a mixin will be "far" from the real spend or "near" the real spend. The mixins are selected independently from the specified distribution baked into the Monero code. The random selection is also independent of the real spend. 05:42:21 So even though proximity isn't a risk, such proximity isn't an issue, as it'll randomly happen or randomly not happen, therefore not changing the odds of such an occurrence in any way flaggable for CA software 05:42:24 I think you bring up some interesting ideas that are worth turning over in the head, though. I'd never turn down a new idea from a stats layperson, since there could be something I am overlooking. 05:42:41 *is a risk 05:42:50 Is that what you were trying to say? 05:44:15 Yeah, so the real spend and a mixin may be very close. Also, any two mixins (or three, four, etc.) may also be close. An attacker cannot really know, just from the simple information of closeness, that one "close" ring member is areal spend when it is next to another ring member. At least that's what comes to mind 05:44:38 Yeah. I do know enough cryptography to understand that ;) 05:45:19 Yet also, the one comment you linked is precisely amount ensuring distribution and limiting proximity. There's a key factor here IMO. The real output isn't random. 05:45:46 That's why it serves as such a flare if you spend an output from 11 blocks ago. Very low chance of randomness, very high chance of human behavior. 05:46:21 Which is why if you have 2 such outputs... it means you either hit the random chance twice (much smaller than hitting it once) OR the static element collided with the random one. 05:46:29 kayabaNerve: The real output _is_ random 05:46:47 It is random by a stochastic process of human behavior 05:47:10 Not when humans are impatient creatures and active users of the network frequently use it limiting the age of their coins. 05:47:44 I do understand it's not perfectly determinable and it's not absolute. I'm saying humans have a different pattern than a random selection algorithm, naive or intelligent 05:47:55 That's still random. It just has a distribution where the age of the real spends is quite young 05:48:37 I think it depends on the definition of random. Arguably, only the horrific og naive impl was "random". 05:48:52 kayabaNerve: What I'm saying is that this problem can be fixed with sufficient insight into the problem. I believe I have sufficient insight. That's what I'm working on. 05:49:16 I'd call the new algo here random, ofc, yet I'd hesistate to call human behavior sufficiently random. 05:49:31 But again, not my field, so I would defer to you ;) 05:51:34 It can be considered sort of semantics. Since my field is economics, which is the study of human behavior under conditions of scarcity, it is natural in my discipline to call certain aspects of human behavior random or probabilistic. 05:52:45 Anyways. The end point of my naive suggestion, which I do think is naive and I've spent too much time arguing on given my limited knowledge (there's some thereom about this), is the collision of a random and a human patterns may be more fingerprintable than the existence of one human pattern alone, due to the multiplicative odds in place (1% * 1% is far less than 1%). Hence why I brought up a way to reduce such collisions. My 05:52:45 suggestion is a very naive way to ensure distribution a-la j-berman added a post check before before even deciding the full ring in question, by generating an extra random element to be replaced (though a post check would likely be more comprehensive) 05:53:24 I get and agree with that. I just stop at calling it random in summary :p 05:53:56 But yes, I've spent far too long talking about things I don't fully understand. I'll try to read through the issue in question over the next couple days, and appreciate you talking me through this. It helped me :) Thank you. 05:55:06 No problem. We need to look at this issue from all angles since it it so, so important for user privacy. You provided yet another angle :) 05:57:10 https://libera.monerologs.net/monero-dev/20210925#c31867 is the start of the #monero-dev conversation, right? 05:59:19 Yes. Unfortunately, it sort of popped up in the middle of a heated discussion. meh 06:00:27 I'll keep my thoughts on that out of here :P Just wanted to confirm and provide a more direct link for anyone who reads through and also wants to read the other convo. Thanks for confirming 15:43:29 atomfried[m]: are you looking at optimizing `ge_scalarmult_p3()`? It seems to be the main blocker for a theoretical speedup in Groth/Bootle proofs. 15:44:39 i can have a look but i cannot promise anything. Is there a docu and or tests what this function should actually do? i dont want to fuck this up :D 15:49:37 Ah, this is part of the core crypto library (added 9yrs ago). I'm not sure what docs it's based on... 15:52:40 Maybe this one: https://eprint.iacr.org/2007/286.pdf 16:44:52 "rupee: I can make it available..." <- rupee : Ok, it's been more than an hour lol. In fact isthmus is going to clean up his Python code, neptune plans to make the SQL code and data available, and I will probably combine my R code with isthmus's repository. So it will take a few more days, but hopefully it will be worth the wait :) 16:52:31 Awesome, thanks for the follow up 21:29:57 In Computer Science, what is meant by "heuristic", used as a noun? Feel free to just respond with a link. 21:31:47 A good guess. 21:32:18 An educated guess. 21:32:30 But algorithmic. 21:33:08 Like, a good heuristic for guessing the real spend in an old monero ring is to pick the last member of that ring :D 21:33:53 I see. Thank you. Heuristic is sort of used pejoratively in economics, and rarely used at that. But I see it being used a lot in these papers. 21:37:47 moneromooo: I figure I should ask you this beforehand: It is ok if I quote you in my CCS proposal? I mean, I suppose I don't need to ask permission, but I'm just doing it as a courtesy. 21:37:51 Specifically, this: "[Fixing the mixin selection algorithm] is important. It's the weakest part of monero." 21:38:44 From https://libera.monerologs.net/monero-dev/20210925#c31927 21:39:53 Sure. 21:41:48 Ok great