18:55:52 Rucknium: was there a specific use-case you had in mind for using the DB outside of `cuprated` (maybe a block explorer?) 18:58:11 as it looks now, cuprate's DB will have a stable byte-layout and use primitive types that C and most languages should be able to represent, it's just a matter of creating the schema in those languages (and if they have an LMDB wrapper) 19:07:50 would a separate binary acting as a DB request/response middle-layer or a C schema so all(?) languages could read the DB directly be better? former would work out-the-box but be a separate binary, latter would be more flexible but it would be up to other people to create language mappings 19:17:19 For statistical analysis of blockchain data. Now I have to go through `monerod` for everything: https://github.com/Rucknium/misc-research/blob/main/Monero-Effective-Ring-Size/xmr-ring-gathering.R 19:17:19 IMHO, it's not a big priority to make sure cuprate database can be read by other programs. I was just interested if the planned implementation would allow that. 19:22:11 So much for not posting the BP++ review progress on Reddit. Oh well: https://reddit.com/r/Monero/comments/1b8yukt/the_monero_standard_96_monero_hits_a_new_record/ 19:38:28 As before, I would caution any readers not to assume more completeness than exists with that incomplete draft 19:38:51 and as noted by Diego Salazar when he posted it, the findings are incomplete and could change at any time 20:19:01 if this were a flood attack are we actually looking at enough TXs to de-anonymize ring sigs? 100k TX implies they could own 75-80% of the recent outputs 20:19:19 100k per day vs the usual 20-25k 20:19:21 ah, so reading `data.mdb` directly from R would be preferred, no other binaries? 21:19:22 I feel like my hard work at [documenting monerod db](https://github.com/Cuprate/old_book/blob/main/src/monero/database/monerod_database.md) didn't payed off... 21:19:22 The [tables](https://github.com/Cuprate/old_book/blob/main/src/monero/database/tables.md) and [types](https://github.com/Cuprate/old_book/blob/main/src/monero/database/types.md) 21:23:40 and which output spents does it hurt the most ? with current DSA ? 21:32:38 "which output spents does it hurt the most ?" Assuming that the tx volume is from an adversary that is trying to reduce user privacy, txs that randomly select as decoys a higher number of outputs created by the adversary. The binomial distribution can tell you the rough probability of that. 21:35:52 spending any recent output would have least privacy ? 21:38:23 I don't think so. Could you explain your hypothesis or classifier? 21:39:53 At least, I don't think the adversarial flood would affect that. 21:41:24 Young/old spending privacy is mostly about how well the distribution of the decoy selection algorithm matches the real spend distribution in those segments of the age distribution. 21:42:36 i did spend a output yesterday and 13 decoys were from last 24hrs, if attacker owns a high % of decoys then effective ringsize would be much lower 21:43:32 The decoys are selected independently from the age of your real spend. 21:45:19 with current dsa what percentage of recent outputs are useds as decoys ? if a attacker owns majority of it then how much would be effective ringsize ? a approx % would be good to know 21:45:55 with current dsa what percentage of recent outputs are used as decoys ? if a attacker owns majority of it then how much would be effective ringsize ? a approx % would be good to know 21:47:08 Ok let me see. I will try to give you something a little exact 22:07:15 With some quick coding, I see 59% of the probability mass function of the DSA are outputs created by the suspected spammer if 75% of outputs created since March 4 are owned by the spammer. 22:07:29 I will have to double check the script. 22:15:37 Trying to be lazy and use wolframalpha, I think this works for calculating how much of the decoy distribution is covered by a given amount of time: 22:15:38 https://www.wolframalpha.com/input?i=gamma+distribution+19.28%2C%281%2F1.61%29%2C++P%28X+%3C++ln%287+*+86400%29%29 22:15:47 The expression is taking ln() of the number of seconds in the time period you want calculated. So 'ln(7*86400)' is the percentage of decoys we can expect to be drawn from the last week (70.8357%). Just replace '7' with however many days you want to calculate for if you want something else. 22:17:56 Yes, that's a good approximation. It will be inaccurate because the output flow amount, which is computed from the last year of blockchain data, does not represent real seconds when tx volume increases a lot 22:21:43 Got it. Makes sense. 22:39:55 Here is the R script: https://gist.github.com/Rucknium/ce83a26ac99fb8debc0b725941340767 22:41:53 I'm pretty sure it's correct. I construct compute the DSA probability mass function for all RingCT outputs. Then I multiply 0.75 by the part of the PMF since the suspected spam started. Then that is summed. 22:42:48 I can't quite get the exact output index correct where the spam is supposed to start. It is off by 0.5% 🤷. Close enough 22:43:09 You can adjust the assumptions if you want 22:47:09 Then if you want to know the probability of each draw of decoys (0,1,2,3...decoys not controlled by an adversary, you input in your R console `round(100 * dbinom(0:15, size = 15, prob = 1 - 0.59), 1)` 22:49:01 The most common number of non-adversary-controlled decoys to draw is 6. That means an effective ring size of 7. 22:51:38 https://matrix.monero.social/_matrix/media/v1/download/matrix.org/RDyMgugEzBVNjgljaTIbWKxS 22:52:08 are the "weird cases" just a result of the spam bcs its transactions sitting in mempool for longer than 10 minutes? 22:52:29 also rip to whoever paid nearly 4 xmr