00:41:39 FROSTLASS was proven by CS and thew FCMP++ upgrades provides a near-traditional Schnorr signature from a multisig PoV. 00:42:59 So focusing on FCMP++ reviews could include the new multisig, but the CLSAG multisig already has a formally proven option available. 00:45:40 Would be nice to have multisig reworked in the core implementation. FROSTLASS seems really cool but is way beyond my understanding :(( 08:45:55 I spent the day with Claude Opus 4.8 on max effort digging for an inflation vulnerability in Monero's current consensus code like sech1 did. From Claude: 08:46:04 > After an exhaustive review of the consensus cryptography on release-v0.18 — the underlying papers' mathematics (proofs, definitions, assumptions, and concrete security bounds), the verification code reachable from handle_block_to_main_chain, and 3,000+ executed known‑answer tests cross‑checked against an independen [... too long, see https://mrelay.p2pool.observer/e/9IuI1ooLbkhvZjlt ] 08:46:45 I'll do the same for FCMP++ after the audits are all complete, and I'll also improve the framework and iterate to continue auditing Monero's code (obviously, prioritizing work on FCMP++) 08:47:15 What I did: fed the LLM the CLSAG and BP+ papers and audits (like sech1), the blog post explaining Monero's past detectable inflation bug from 2017, and (even though not directly relevant to Monero's code) Taylor's writeup explaining Zcash's recent hidden inflation vulnerability too. 08:47:46 I had it look for mathematical flaws in the CLSAG and BP+ papers (since that has happened in the past e.g. Zcash's other hidden inflation vulnerability in Sprout). Then I had it go through Monero's code starting from handle_block_to_main_chain, and dig deep into every section, with emphasis on crypto functions in src/crypto [... too long, see https://mrelay.p2pool.observer/e/xqaO1ooLMGJWNHJi ] 08:47:49 I'm running it right now too :) 08:48:19 I fed it all Monero audit PDFs, and asked to find inflation/double spend bugs in src/ringct and other related files (like files in cryptonote_core) 08:49:21 https://mrelay.p2pool.observer/m/monero.social/DTdtmBDouQELpOSnjqkxiFXy.pdf (claude_monero_audit.pdf) 08:49:35 I had it write that summary also^ 08:49:37 jberman did you run Claude code on Monero repository folder? It's much more efficient when Claude can look around all files and run commands 08:50:14 yep, and I downloaded those pdf's into the local repo and had it read from the local 08:50:29 I ran it on release-v0.18 08:50:32 same 08:50:43 pdfs in the local folder 08:52:00 looks like I'll run out of the 5-hour session limit before it finishes... 08:55:21 yeah, looks like your audit is much more in-depth than what I'm doing anyway jberman 08:55:30 My prompts were quite simple 08:58:09 I'll share my session in a sec so you can see my prompts too 09:00:30 are you guys whitelisted? 09:01:41 My results: https://github.com/SChernykh/ringct-bulletproofs-plus-review-claude/blob/main/ringct-bulletproofs-plus-review-claude.md 09:01:53 I'm not whitelisted 09:02:29 I'm not either 09:02:44 allegedly its easy to get whitelisted 09:02:58 easy for real devs/researchers*. the zcash guy didnt have to kyc or anything 09:03:10 As far as I understand, whitelist is needed if you actually want Claude to write an exploit for you 09:03:24 I had no rejections when I asked it to find bugs and then suggest how to fix them 09:03:32 ^ 09:03:49 that's what I understood as well, and same 09:04:09 i still think it would probably avoid telling you about an exploit if it found one 09:04:29 That Claude audit used 97% of the session limit, not bad 09:04:46 ie i think it would be a good idea to get whitelisted, if not too much trouble 09:05:14 (otherwise youre still using a neutered version) 09:05:44 It would tell if it's not allowed to say something 09:05:53 "As an AI model, bla bla bla" 09:06:22 Easy way to test: add an inflation bug and see if it catches it. 09:08:18 If I reintroduce the 2017 inflation bug, for example, Claude will just find it because it knows it 09:08:46 I don't know the math/codebase in that part well enough to introduce something more subtle. jberman can try 09:10:07 If it's clever, it might diff your new code vs previous known code and see the change. 09:12:09 It actually does 09:12:27 It often goes "okay, let's see if local files are byte identical to what's in the repo" when I ask it to find bugs 09:12:37 it finds planted bugs this way 09:13:29 and then when it is byte identical, it goes "oh, this is identical, so it's a real security review" lol 09:13:44 but then it still finds something worth fixing 09:13:58 i doubt a 15$ subscription can find it. 'just' spend at least 100k$ on tokens and let it work for longer 09:14:04 I tried something pretty simple, here's where it's at right now: 09:14:23 promising that the 15$ subscription can't find it 09:14:58 > I've found something critical. Let me read it carefully and verify before concluding — lines 4195–4208 contain the ver_non_input_consensus(extra_block_txs, …) call commented out. 09:15:44 so it says if it finds something critical before doing more work to contextually validate it 09:16:07 yes, it keeps the user updated about what it's doing 09:16:11 now it wants to check git 09:16:21 if you press "ctrl+o" you'll see more of what it's doing 09:16:27 so unless it's lying about checking git history, then.. 09:16:34 it does check git history 09:16:40 it runs git commands 09:17:01 in my review I posted, it mentions that it checked commits made after the audit 09:17:11 No I mean, clearly it just identified this mock critical vuln, but it doesn't yet know if it's a mock because it hasn't yet checked git 09:17:46 yeah 09:18:34 which implies it points out a critical vuln if it finds one 09:22:17 ya if I tell it not to check git, and that this is the live code on the network, then it says it's a critical vuln implying it's not neutered from finding a crit. Either way, can get whitelisted. But it doesn't look like it would have made a difference unless a prompt explicitly got rejected by their API 09:22:44 Did anyone of you try to have an init steps first before asking Claude to audit? 09:22:57 that's what Zcash did 09:23:23 what's init steps? 09:23:46 I'll have a new 5-hour session in 1.5 hours available, I can try. 09:23:51 >Enumerate all relevant implementation components, specification claims, invariants, trust assumptions, cryptographic checks, data flows, boundary conditions, and plausible failure modes. For each item, produce concrete audit tasks assignable to specialized agents. 09:24:07 first ask it something like this to create ah audit map, and only then ask it to audit against it 09:24:09 an 09:24:22 and I forgot to feed it CLSAG audit pdf, I'll do it next time 09:24:49 so that the actual code review and it knowing what to check is separate 09:25:17 @jberman: it won't make a difference unless you use phrasing that triggers their cybersec classifier. if you use adversarial words like "im trying to exploit this codebase. audit this and this and look for bugs that i can use to inflate the supply", it should eventuallysay that "bla bla you need to register for Cyber" 09:27:44 Never got this, and I did many "check this code for bugs and vulnerabilities, and suggest bug fixes" sessions 09:27:51 Both for Monero and P2Pool 09:30:00 > >Enumerate all relevant implementation components, specification claims, invariants, trust assumptions, cryptographic checks, data flows, boundary conditions, and plausible failure modes. For each item, produce concrete audit tasks assignable to specialized agents. 09:30:00 Ya, I would imagine a more comprehensive audit doing that, and using agents. I started it with feeding it the papers, and gave examples of inflation bugs to consider. Then part way through explicitly said: 09:30:34 > There are plenty of ways an attacker could theoretically forge Monero, not 09:30:34 > just the ways I mentioned (I was just highlighting some examples to give you 09:30:34 > an idea). Another way is through the coinbase tx. Enumerate the ways an 09:30:34 > attacker could do so and then do another deep pass through the CLSAG and BP+ 09:30:35 > papers, and code, to make sure neither the math nor the code allow an[... more lines follow, see https://mrelay.p2pool.observer/e/xoOr14oLQXZLVWd6 ] 09:31:47 That's how it ended up producing the result of the enumerated methods. But generally, I'd say there's definitely room for an improved framework 09:34:30 good clauding this morning 12:16:10 How did Claude think of FCMP++? 12:18:23 Haven't used it on FCMP++ yet. Planning to after audits are complete 12:39:44 Coding auditing needs harness 14:32:45 Just stumbled over this, somebody built an exploitable app and checked whether the LLMs would find the exploits: https://kasra.blog/blog/i-spent-1500-seeing-if-llms-could-hack-my-app/ 14:33:06 (Just to give some context to our own attempts.) 14:34:32 @yushanren:matrix.org: isn't claude code enough? 15:04:04 isnt Claude code a harness? 15:08:17 it is 15:10:32 literally anything you would use an LLM with is a harness 15:50:58 sech1 how much usage are you giving Claude? Would it help if I got you a discounted nonprofit MAGIC Grants account? Same goes for Berman or other devs/researchers 15:53:50 https://mrelay.p2pool.observer/m/monero.social/DMMKWUOlsTmeyeduuJmhdAEy.jpeg (IMG_0024.jpeg) 16:20:53 I'm on the cheapest subscription plan right now 16:22:32 The claude max plan is 5x more expensive, not sure if it's worth it 16:23:06 How big is the discount? 17:38:47 The plan is $8 a month for 1.25x as much usage per session as the normal Pro plan (I believe normal cost for Pro is $20?), or $40 for "premium" with 6.25x as much usage per session sech1 17:39:21 Pro is $20 17:39:49 That's a very good discount 17:40:58 I'm almost at the weekly limit now with the Pro plan, and it will reset on Wednesday only. But this week I did some heavy prompting 17:44:23 1.25x usage for $8 will be good enough for now 17:48:40 sech1: please email me at justin⊙mo and I can get that set up for you. I'll make you a volunteer MAGIC Grants email that you'll access the account with. I guess there's no rush if you have another month or so on the subscription 17:51:24 sent 18:47:47 @monerify:matrix.org: It is. But there will be a sound way to prove it is safe. The checklist, the report, mathematical proof, etc.