15:03:05 MRL meeting in this room in two hours. 17:00:29 Meeting time! https://github.com/monero-project/meta/issues/1037 17:00:34 1) Greetings 17:00:57 Hello 17:01:00 <0xfffc:monero.social> Hi everyone 17:01:11 *waves* 17:02:41 Hi 17:02:59 2) Updates. What is everyone working on? 17:03:52 <0xfffc:monero.social> (1) Fixed a bug, (2) working on start up speed up: https://github.com/0xFFFC0000/monero/pull/26 17:04:07 me: Helping with stressnet. Reading papers about gossip networks and analyzing the Monero p2p transaction gossip messages. 17:04:37 howdy 17:05:07 me: completed first pass at growing and trimming the tree (db and tests implemented, tests passing) 17:05:47 Not much to report, between work contracts atm 17:07:29 me: Jamtis-RCT. Specifically, the conversion functions between X25519 and ed25519 points for legacy address support. If anyone knows a way to find the ed25519 (x, y) values from X25519's x value without doing explicitly finding the y value for X25519 by performing a square root op, please let me know 17:08:07 Hello 17:09:00 3) Stress testing monerod https://github.com/monero-project/monero/issues/9348 17:09:23 ("the tree" = fcmp curve trees merkle tree 17:09:25 I think we found another problem that could prevent block sync. Sometimes a valid block can be placed on the `m_invalid_blocks` list. It looks like a node can get on a short alt chain. Then it marks one of the main chain's blocks as invalid. Because the block is "invalid", it stops syncing to the main chain and bans nodes that send the "invalid" block. 17:09:46 I guess that nodes are applying the alt chain's rules for validity to the main chain, and think it's invalid. Probably it has something to do with the 100 block median block size, the penalty area, and how much miners can pay themselves. Maybe miners pay themselves too much on the main chain's block when the alt chain's rules are applied. 17:10:04 The nodes can start syncing again if they shut down and restart or if the node operator inputs `flush_cache bad-blocks` into the monerod console. 17:11:22 In the incident that I analyzed closely, the alt chain was 2 blocks long 17:12:24 I guess the problem is the block size computation because AFAIK this type of thing doesn't happen on the mainnet, but mainnet usually doesn't increase the 100-block median block size. 17:13:05 AFAIK, there is no plan yet to try to debug this 17:15:11 This week I started spamming 144in/2out txs to see if the out-of-memory issue that some people saw on mainnet would happen. I've seen a modest increase in RAM usage, but nothing that would cause a major problem. Most nodes on stressnet have 30 connections or fewer, so the OOM issue may be due to number of connections more than how many inputs txs have 17:15:53 But the CPU usage of nodes has increased because verifying inputs requires more CPU per byte of data than verifying outputs. 17:16:51 A low-end node (4GB RAM, 2 CPU threads) took 30 seconds per block recently to catch back up with chain tip with 4MB blocks mostly filled with 144in/2out txs. 17:18:21 Any more comments on stressnet? 17:20:05 sounds like a nice list of daemon issues has formed for would-be takers to pursue further 17:20:15 Here's the relevant part of the log file when the nodes fail to sync, by the way: 17:20:33 D [: INC] first block hash , last 17:20:34 D block found in main chain 17:20:36 D block <33a5c86ea4a291ddf810df8620784ec8c9fea4014280a0d14f4bb6f6e0c17348> found in alternative chains 17:20:38 D block <4a76385a6ddb03134ce72d88c8e08c26737f43a5fb88ca2f278ef067e39fbaf2> found in m_invalid_blocks 17:20:40 E [: INC] Block is invalid or known without known type, dropping connection 17:20:59 4) Potential measures against a black marble attack. https://github.com/monero-project/research-lab/issues/119 17:21:58 I don't have much to report on this issue today except I found a paper that proposes an alternative to Dandelion++: 17:22:00 Franzoni, F., & Daza, V. (2022). Clover: An anonymous transaction relay protocol for the bitcoin p2p network. https://moneroresearch.info/index.php?action=resource_RESOURCEVIEW_CORE&id=222 17:22:48 I just skimmed the paper. It claims that Clover is as good as D++, simpler, and works well for nodes that have no incoming connections (i.e. its ports are closed). 17:23:39 Hmmmm that's a really interesting issue. I wonder what the value of even having a running "invalid blocks" list is, realistically. Obviously it would speed up failing known repeated bad blocks for honest peers, but the PoW check should be the very very first check performed on block data. This will mathematically limit the number of non-trivial bad blocks that can exist. An unhone st peer, however, can send any number of garbage blocks that don't pass PoW verification. 17:23:52 The last part is interesting. A few months ago we discussed a potential issue with D++ when nodes have no incoming connections. D++ treats incoming and outgoing connections differently for its graph theory to work. 17:24:51 jeffro256: In the log files I found an error message about the "invalid" block pointing to this line of code: https://github.com/monero-project/monero/blob/master/src/cryptonote_core/blockchain.cpp#L1217 17:24:52 Right above that is a comment: "FIXME: Why do we keep invalid blocks around? Possibly in case we hear about them again so we can immediately dismiss them, but needs some looking into." 17:25:56 Waiting for a fix for 4 years :) 17:26:28 I've only skimmed the Clover paper. The D++ paper looks at a lot of different threat models. It is a dense paper. On a skim, the Clover paper looks like it analyzes fewer threat models. Anyway, I will keep the paper and come back to it later. 17:26:34 Ah, now, that remark is 9 years old. 17:27:35 My naive view is to try to fix the miscalculation of the block validity first. 17:29:35 5) Research Pre-Seraphis Full-Chain Membership Proofs. https://www.getmonero.org/2024/04/27/fcmps.html 17:29:49 kayabanerve, kayabanerve 17:30:32 By the way, I've listed "Uniformity of Monero's hash-to-point function" as a separate agenda item. (It's next on the list.) 17:31:32 I have no updates on either. 17:31:56 But thank you :) 17:34:05 I skimmed a few related papers on the hash-to-point function. One of them proves that a different hash-to-point function is approximately uniformly distributed. I wonder: would it be useful to do some preliminary statistical tests of the uniformity of Monero's function? If it fails one or more tests, that's a good reason for a researcher to look at its potential biasedness more formally. 17:34:19 Diehard-like tests. 17:35:51 kayabanerve: What do you think? ^ 17:37:23 I'm honestly unsure the methodology for testing. If you have learned it and want to, I'd be happy to comment as I can, but my expertise on the subject only went as far as to know to hire someone to ensure we're not messing up :p 17:38:44 AFAIK, what I would need would be a large number of the points to test if they are uniformly distributed or (even better) an easy-to-use function that I could use to generate points at will. 17:39:19 I would have to look into this more, but that's what I think would be required. 17:40:18 And then I would have to understand elliptic curves a little if the test is actually uniformity on the curve instead of just uniformity on the set of integers 17:40:27 Rucknium: you probably know this already, but it's important to note that a random ed25519/X25519 public key (and therefore key images) as we serialize them will NOT look like a uniform random string since not all (X, Y) combos are valid. And of those (X, Y) combos, we only really use 1/8 of them (the one's without cofactor) 17:41:20 If it doesn't fail any tests, that doesn't necessarily mean that it's unbiased. But if it does fail tests, then that could make a researcher more interested in it. 17:42:25 There's a C++ and a Rust impl 17:43:17 They explicitly shouldn't be uniform over the byte array. 17:43:41 Presumably, the field element should be 17:44:18 the original CryptoNote whitepaper claims the hash to point function, as used for original ring signatures, does not need to be perfect: `None of the proofs demands Hp to be an ideal cryptographic hash function. It’s main purpose is to get a pseudo-random base 17:44:20 for image key I = xHp(xG) in some determined way.` 17:44:43 do our other usages? 17:47:32 The original proof's implementation does allow forgeries of you can predict the HtP IIRC. 17:48:12 ... if you can control its output to a solved for point? 17:49:54 I'm unsure how to phrase it. I don't think that's the risk, due to it using keccak to start, and I wouldn't be surprised if it's fine if it offers a notably reduced bit count than desirable for a hash fn of its length. 17:50:23 But I wouldn't take that informal comment at face value. 17:54:11 Any other comments on this or agenda items? 18:01:14 I believe I know of a way that that a valid block could be added to the invalid blocks list. In `Blockchain::switch_to_alternative_blockchain`, if `handle_block_to_main_chain` returns `false` for any reason, we add the alt block and its children to the invalid blocks list. However, `handle_block_to_main_chain` can fail for temporary non-deterministic reasons, one of them being tha t the transactions of that block are not found inside of the mempool. This might have happened during the stressnet because the mempool got full. So if the node attempts reorging from a worse alt chain to the main chain, but a transaction is missing, the main chain block permanently gets added to the invalid blocks list 18:02:07 I haven't confirmed it yet, so take it with a grain of salt, but that's maybe a possibility 18:02:40 "one of them being that the transactions of that block are not found inside of the mempool" If that's true, that is a strange design decision. 18:03:01 Transaction propagation on stressnet is not perfect. 18:03:16 Well the node inherently can't determine the validity of a block if it doesn't have each tx on hand 18:03:42 Doesn't the fluffy block protocol ask nodes for "missing" txs from the node that sent the block? 18:03:59 There are log messages about requesting the missing txs 18:04:20 Yes, usually 18:04:30 I mean, not necessarily in this specific bug, but one of the log categories will give you that log message sometimes 18:05:30 Maybe the sending node doesn't have the tx in its txpool either? 18:05:41 But it would have it in its blockchain 18:06:15 But if a block gets added as an alt block, and the txs were ALREADY in the pool, then the txs aren't forced to stay in the pool like they are with a normal fluffy block directly to the main chain. 18:08:20 We can end the meeting here. And continue the debugging discussion after the meeting 18:08:39 So this sequence of events could happen: 1) TX T gets propagated, 2) alt block B1 gets added as alt block, 3) full mempool causes TX T to get pushed out 4) alt block B2 has B1 as previous block and gets added as an alt block 5) we attempt to reorg to block B2 6) TX T is not present 7) blocks B1 and B2 get added as invalid blocks permanently 18:08:51 (B1 contains TX T btw) 18:13:19 I think this was happening even when the tx pool wasn't full by the way. 18:15:28 Would you be able to send me the full log please? I would like to see which reason is logged as the initial failure for `handle_block_to_main_chain` 18:17:13 Yes. When the bug occurred I just had log level 0. But the bug seems to happen about once every 3 days per node. Is there a log level I should set on nodes to get more info? 18:32:57 3 days is actually the interval after which old txs are dropped from the mempool. That might have something to do with it 18:35:12 For logging consensus verification errors, you can use log category "verify" and on ERROR level