02:01:06 secparam: I ran multiple simulations of get_outs in my client making sure to isolate the gamma outputs. Here were my results attempting get_outs 100k times 02:01:11 * jberman[m] uploaded an image: (33KiB) < https://libera.ems.host/_matrix/media/r0/download/matrix.org/ulMXVUJtxKPHbbGAyDaCzKpG/get_outs%20100k%20simulations%20results.png > 02:02:03 * jberman[m] < https://libera.ems.host/_matrix/media/r0/download/libera.chat/b7b8487e9ae6c5b1b786e882cadd487d753a2107/message.txt > 02:40:29 Where did you report that bug, jberman? 02:40:29 I don't see an issue or mention in #monero-dev:monero.social and want to make sure its appropriately shared and reviewed. 02:41:56 DM'd moneromooo, luigi and fluffy. Talked about it with moneromooo, and ran it by him before sharing it here. He suggested sharing in -dev too. Should I take it there also? 02:42:37 dev is a good place if you're making these PRs, yup :) 02:42:38 Great! Yeah, that would be a good idea if moneromooo and co are game with it being public. 02:44:00 Sounds good 06:13:29 jberman that line in the code, "average_output_time = DIFFICULTY_TARGET_V2 * blocks_to_consider / outputs_to_consider;", calculates time in seconds so it should be ok with integer division 06:15:31 actually it depends on what range "outputs_to_consider" covers... Need to step through it in the debugger to find out 06:22:56 sech1 it is supposed to be computing the average rate that outputs were added to the chain over up to the most recent 1 year period 06:24:23 "DIFFICULTY_TARGET_V2 * blocks_to_consider" is definitely in seconds. The questionn is, what's the resulting value there (average_output_time) 06:24:29 It does feel a bit embarrassing that I didn't notice this integer division issue either 06:24:30 if it's 1 or 2 then it must be computed as double 06:24:54 but if it's ~164000 (1.9 days in seconds) then it's ok 06:25:27 outputs_to_consider should be the total number of outputs in the last year's worth of blocks 06:26:31 then it should be computed as double 06:27:18 How UkoeHB is reading it is how I read it too^ 06:27:18 Using integer division, you end up rounding the true average of ~1.9 seconds per output over the past year down to 1 second, which is a significant difference from the true average, that then flows into the output selection in pick(). I can’t see why it would be expected to have such a large difference from the true average 06:27:19 Plus, just taking a step back, it wouldn’t make sense for average_output_time to be a double if the intent was to do integer division there 06:28:27 I think this algorithm was built when tx volume was way lower, so it must not have affected test runs 06:29:46 Ya the more outputs, the worse the impact on the calculation downstream 06:30:07 *the more often outputs appear 06:30:26 once it got sub 5 seconds per output on average, it got bad 06:31:01 hmm, won't we get 0 there eventually, if we have a lot of transactions for more than a year? 06:31:46 more than 120 outputs per block on average 06:31:52 Yep, probably would’ve been discovered naturally soon 06:32:04 Through normal use 06:32:19 division by zero on line 1031, classic 06:33:36 Ya seems clients would start failing to construct tx’s 06:37:48 should be easy to fix, right? was planning a new release anyway 06:39:26 Super easy: https://github.com/j-berman/monero/pull/3/files 06:40:52 Waiting on repo rights to submit a PR, but I don’t care if it’s not me and you wanna release in the meantime 06:47:52 this will however change output selection algorithm and we'll have two different versions for quite a while. There will be probably privacy implications 06:48:03 two anonymity "puddles" instead of one 06:48:46 for quite a while = until next hardfork 06:59:38 That’s true, maybe it does makes sense to hold off 07:03:25 wrong output selection hurts privacy already now though 07:04:16 I think the benefits of the fix outweigh it 07:04:43 pools (miner payouts) and exchanges (deposits/withdrawals) will update quickly and then we'll have big enough anonymity set for the new output selection 07:05:38 besides, division by zero bug won't wait 07:44:09 so if I'm reading the code right, this bug makes all decoy outputs older than they should've been (1.9x older), which makes "newest output = real spend" heuristic work better 07:44:15 in this case, it should be fixed asap 07:49:58 hmm the odd thing is secparam[m] 's results which showed most decoys at 20 blocks, which jberman[m] seemed to corroborate 07:53:46 hmm, I read the code wrong. "uint64_t output_index = x / average_output_time;" -> here the bug makes it bigger, but... 07:54:06 "output_index = num_rct_outputs - 1 - output_index;" -> here it becomes smaller than it should've been, so it skews selection to newer outputs? 07:55:27 looks like it 07:55:51 average_output_time is underestimated -> x / average_output_time is overestimated -> num_rct_outputs - 1 - output_index lands you with an output index further back in the chain 07:59:01 are rct_offsets ordered from newer to older? 07:59:35 ya, I think more rigorous analysis taking this into account and comparing to observed distributions over block intervals since this was introduced could provide more clarity. was introduced ~April 2019 08:00:08 the final value of output_index is smaller because of the bug, so it chooses smaller index from rct_offsets 08:02:26 `average_output_time` is smaller (rounded down), which means `output_index` is larger on initial value (division), which means its final value is lower, so it causes the algorithm to select a lower block 08:04:00 that^ 08:06:51 UkoeHB how are rct_offsets ordered? From lower (older) block to higher (newer)? 08:07:35 yea that's how they're ordered 08:07:52 I only found this code: https://github.com/monero-project/monero/blob/master/src/cryptonote_core/blockchain.cpp#L2304 08:08:08 which suggests it's in increasing height order 08:08:32 so after untangling all that, the bug skews to older outputs 08:21:49 if we were to get an on-chain distribution since this was introduced, and we observe something like dual peaks, one at ~10 blocks and one further right, then that would suggest the newest heuristic would work well and the left-most peak is likely from real outputs. But that's not the only cause of twin peaks. The less bad cause would be the point at which the average output time shifted from 2 to 1, that would cause twin peaks as 08:21:49 well 09:17:50 the fix is in PR 7798 and 7799 23:13:16 jberman[m]: can you redo your get_outs simulation post-fix?