-
UkoeHB
meeting 2hr
-
baro77[m]
Sorry everybody, I'm a rookie and maybe with this post I'm making more noise than necessary, but I wanted to signal this (just seen):
-
baro77[m]
-
baro77[m]
(btw, cross-posting to monero-dev)
-
baro77[m]
correction: not cross-posted, 'cause I have seen @jrg was 1 hour ahead of me ;)
-
endogenic
beat me to it as well though
-
endogenic
I hear that it doesn't affect Monero though
-
baro77[m]
๐
-
cheekyleeks
-
cheekyleeks
Luckily Monero/Ristretto/zkcrypto/libsecp256k1 had it right from the beginning and were not affected. This shows how error-prone implementing Fiat-Shamir is and it's best to use a library like Merlin github.com/zkcrypto/merlin
-
baro77[m]
๐ I showed my rookie-ness ;)
-
UkoeHB
-
UkoeHB
1. greetings
-
UkoeHB
hello
-
endogenic
๐
-
Rucknium[m]
Hi
-
dangerousfreedom
Hello
-
xmr-ack[m]
hey
-
jberman[m]
howdy
-
UkoeHB
-
UkoeHB
short story short: does not affect monero
-
UkoeHB
any questions on that disclosure?
-
xmr-ack[m]
Not being as familiar with bulletproofs, why exactly does it not affect Monero?
-
moneromooo
Monero's was written by Sarang, who was careful.
-
Rucknium[m]
The blog post claims:
-
Rucknium[m]
>Why is this type of vulnerability so widespread? It really comes down to a combination of ambiguous descriptions in academic papers and a general lack of guidance around these protocols.
-
Rucknium[m]
Is it true that there is this shortcoming in the cryptography space?
-
UkoeHB
xmr-ack[m]: The issue was the paper did not have a complete implementation guideline, so anyone going directly off the paper would have made a certain small error in the Fiat-Shamir transform (neglecting to hash some public variables). Monero's implementation followed best practices for FS transforms, which prevented that issue.
-
UkoeHB
Rucknium[m]: yes, best practices for implementing cryptographic protocols are not well-disseminated among general developers
-
UkoeHB
Also, afaik most crypto professionals aren't implementing advanced algorithms, they are dealing with data encryption and encrypted channels. It's mainly in the cryptocurrency and experimental crypto spaces that these big protocols are implemented.
-
dEBRUYNE
UkoeHB: Does it list the projects that are affected somewhere?
-
UkoeHB
yes it's in the blog post
-
UkoeHB
Most crypto professionals aren't rolling their own crypto from scratch *. They are following some highly-scrutizined specification.
-
UkoeHB
Anyway, we can move on
-
UkoeHB
3. updates, what is everyone working on? monerotopia was last week, it was great to meet several people
-
jberman[m]
Aside from monerotopia which was sweet, been working on PR review (my list atm: 8046 [done], 7760 [in progress], 8076 [in progress]), digging into a sporadic bug with tx broadcast (8251, 6929), and submitted an updated proposal for subaddresses in light wallet servers in collab with endogenic here:
monero-project/meta #647
-
UkoeHB
me: I'm continuing work on my seraphis library. I realized getting an accurate minimum fee oracle will be a huge pita, so I'll probably just make a trivial mockup and move on...
-
Rucknium[m]
Working with jberman and mj-xmr on translating some C++ to statistics language. And doing some prototyping of statistical models with some simulated data.
-
dangerousfreedom
Im writing some code and text to understand and explain mlsag and borromean ring signatures. I think I will have something to present by the end of the month.
-
UkoeHB
3. discussion: any topics to discuss? questions/comments?
-
dEBRUYNE
UkoeHB: Thanks
-
xmr-ack[m]
Me: Unfortunately, I couldn't attend Monerotopia :(. But I do have some updates in regards to my work with MAGIC. I have a working implementation of two rudimentary machine learning models and have been training them on a preliminary dataset. I want to clarify that the preliminary dataset has many flaws... (full message at
libera.ems.host/_matrix/media/r0/do…6186c27f11139c90f09926efa9e71009791)
-
xmr-ack[m]
* Me: Unfortunately, I couldn't attend Monerotopia :(. But I do have some updates in regards to my work with MAGIC. I have a working implementation of two rudimentary machine learning models and have been training them on a preliminary dataset. I want to clarify that the preliminary dataset has many flaws and that most of the transactions were collected with an (unrealistic) 20-minute uniform delay. I am currently refining the
-
xmr-ack[m]
process of collecting a new dataset that does reflect real user spending patterns.
-
xmr-ack[m]
That being said, after training on an undersampled dataset removing the possibility of a guess newest heuristic, I am seeing accuracies upwards of three times random guessing classifying the true spend of an arbitrary ring with no outside information.
-
xmr-ack[m]
Metrics:
-
xmr-ack[m]
Random Forest Testing Accuracy: 23.41%
-
xmr-ack[m]
Random Forest Weighted F-1 Score: 19%
-
xmr-ack[m]
Gradient Boosted Classifier Testing Accuracy: 27.40%
-
xmr-ack[m]
Gradient Boosted Classifier Weighted F-1 Score: 26%
-
xmr-ack[m]
Feature Importance (Top 5):
-
xmr-ack[m]
Inputs.1.Time_Delta_From_Newest_Ring_To_Block: 13.07%
-
xmr-ack[m]
Total_Block_Tx_Fees: 10.97%
-
xmr-ack[m]
Inputs.0.Time_Deltas_Between_Ring_Members.9_10: 4.98%
-
-
Rucknium[m]
My commentary (I've seen these results prior to now): This underlines again that when there is a substantial mismatch between the decoy selection algorithm and the real spend age distribution, statistical methods can be used to narrow down which ring member is the real spend.
-
Rucknium[m]
And here there was a large mismatch, since the "real user behavior" in this synthetic data was uniform over just 20 minutes.
-
UkoeHB
Thanks xmr-ack[m]
-
Rucknium[m]
Also the feature importance is interesting, but it is hard to know if there is any real problem with some of these features without some sort of confidence interval or p-value on these estimated feature importance results. Since the feature importance may be nonzero just due to sampling error.
-
UkoeHB
On a different topic, I want to point out that Monero multisig is capped at 16 group members for efficiency reasons during account generation. I was assuming people who want to use multisig would see that limit in the code, but maybe not... (looking at Thorchain). A different approach to key generation (see the FROST paper) is much more efficient when M < N - 1 (enabling larger groups generically), but my focus was on
-
UkoeHB
N-of-N and (N-1)-of-N for efficient escrow. I will probably implement a FROSTy key generation backend for multisig in my seraphis branch (but probably won't port that to master, since current multisig uses round-robin signing but you need aggregation signing for FROST - which I have in my seraphis branch).
-
Rucknium[m]
In other words, if there is some feature that the machine learning algorithm identifies as helping classify the real spend, then it would make sense to investigate further and see if there is some sort of problem with the code or the protocol.
-
UkoeHB
Rucknium[m]: I think most features that can affect distinguishability can be manually identified, although it would be helpful to get a sense of their significance in practice.
-
UkoeHB
isthmus and neptune have done a bunch of research into distinguishable features (numerical counts of different fingerprints), but did not extend that to transaction tracing
-
moneromooo
One problem is that ground truth is kinda biased due to self selection (I forget the exact term).
-
moneromooo
ie, the rings you know the real spend for will not be randomly selected from the set of all rings.
-
moneromooo
So using this as a proxy for ground truth will bias you.
-
moneromooo
I think. Not a statistician. etc.
-
UkoeHB
makes sense to me
-
mj-xmr[m]
The more I think we need to look for the ground truth in what the code generates and simulate the situations, rather than only relying on past data.
-
mj-xmr[m]
(in search of ground truth, I mean)
-
UkoeHB
I want to implement binning in seraphis this week. There are two choices: A) completely deterministic binning (selection distribution is baked into consensus), B) partially deterministic binning (only bin members are deterministically distributed around their bin locus, but each bin's locus is manually specified).
-
jberman[m]
seems a +1 for binning to me too, because the gamma AFAIU is pulled from just a subset of rings in which the real could be identified, rather than all rings. binning mitigates weakness in an incorrect gamma
-
jberman[m]
reasoning for B was to avoid tying the algo to consensus to allow changes without a hard fork right?
-
Rucknium[m]
UkoeHB: To go back to the manual identification vs. machine learning identification: the 5th most important feature, according to these preliminary results, is transaction size. I don't know of any obvious issue with transaction size that would help identify the real spend, so this ML approach can extend our knowledge of transaction uniformity defects beyond what was previously theorized by isthmus, neptune, and others.
-
UkoeHB
The advantages of (A): less data in a tx (maybe 30-80 bytes), less room for implementation variance that can open anonymity puddles.
-
UkoeHB
The advantages of (B): we aren't stuck with a fixed selection distribution, it's a lot easier to implement.
-
moneromooo
I can think of one reason tx size identifies the real spend:
-
UkoeHB
tx size can be decomposed to input/output count + tx extra fields.
-
moneromooo
A common reason for large txes is many input txes, which are typically made to consolidate small outputs, like dust etc. These are usually old and you consolidate once you have to after spending the rest.
-
moneromooo
Another is pool payments, paying to many people. High churn so typically new outputs.
-
moneromooo
First case will have two outputs, first ring member. Second case, more than two outs, last or close to last ring member.
-
-
moneromooo
s/many intput txes/many inputs/
-
Rucknium[m]
UkoeHB: How does deterministic ring member selection affect tx verification time? Also, I have been trying to understand your deterministic ring selection proposal on github, but I feel that I'm missing something. Is there some cryptographic knowledge I need to understand it?
-
moneromooo
BTW, if you write too long lines, your message gets omitted, and replaced by a url.
-
UkoeHB
there's no cryptography other than a hash function
-
UkoeHB
it shouldn't have a noticeable effect on tx verification time
-
Rucknium[m]
Last time we talked about enforcement of decoy selection, there was enthusiasm for enforcement at the node level rather than the consensus level. Could Seraphis's deterministic decoy selection be written in a way that `monerod` checks it in the mempool, but does not declare blocks invalid if there are txs mined that do not conform?
-
UkoeHB
You could do that, but I feel it is a bit of scope creep for nodes
-
jberman[m]
<Rucknium[m]> "UkoeHB: How does deterministic..." <- The mapping probability to cdf to output index was the most difficult part understanding it for me. I wrote up some code that made it click for me a while back that might help, will share once I find it Rucknium
-
jberman[m]
This paper is super helpful:
spar.isi.jhu.edu/~mgreen/mixing.pdf , and a presentation on it was given that helped me too that I can't find right now
-
Rucknium[m]
jberman: Mainly I don't understand how the real spend is made indistinguishable from the deterministically-selected decoys. Trying to understand that.
-
UkoeHB
jberman[m]: damn I lost that presentation...
-
moneromooo
The obvious method is to include an offset, selected to that one of the deterministic picks matches the real spend.
-
moneromooo
selected so that...
-
UkoeHB
Rucknium[m]: you generate a random set, then shift that set so a random member lands on the real spend
-
Rucknium[m]
Ah ok. Great. Might that give you gaps in the beginning or end of the distribution, though?
-
UkoeHB
why?
-
moneromooo
The offset will tend to be large for old spends.
-
Rucknium[m]
And it seems more feasible with Seraphis-sized ring member counts rather than 11 or 16
-
UkoeHB
you transform the random set into a uniform space before rotating, then transform back into the real selection space
-
moneromooo
Though I suppose that, depending on the generation algorithm, you might be able to brute force one with an old spend and small offset...
-
UkoeHB
we are at the end of the meeting, thanks for attending everyone
-
jberman[m]
High level summary of what I did in my code sample that explains how it can mesh with the gamma: assume 16 ring members. Use a random seed to generate 16 real numbers between 0-100. Those values are the starting spots in the gamma CDF that we will offset in a later step. Take the real output, map that into its spot in the CDF. Take 1 spot from the original 16, offset it to match the real's CDF, and then offset all other spots by
-
jberman[m]
the same amount. Then all you need to store is the offset and the random seed, and you can deterministically find all 16 ring members
-
jberman[m]
there are floating point calculations tho, which I don't know how to avoid
-
moneromooo
Maybe boost multiprecision stuff ? 256 bit numbers. Can be used for fixed point.
-
moneromooo
And you don't care about imprecision, as long as it's deterministic.
-
jberman[m]
seems reasonable to me. also need to make sure that every output in the chain can plausibly be referenced as a decoy, just gotta keep that in mind/double check when implementing
-
jberman[m]
i don't know if my code sample guaranteed that/what bit precision in the real might be necessary to guarantee that
-
UkoeHB
jberman[m]: I'll try to implement the fully deterministic version, and if it seems too difficult to get right I'll just give up and go with partially deterministic.
-
UkoeHB
One additional complication is you want a gamma over block heights, rather than over output indices..
-
moneromooo
I strongly suspect that if imprecision means some outputs can't be selected, changing height will change the selectable part and yo'll get full coverage. Though you do bring up an excellent point that it does alter the probability distribution...
-
jberman[m]
<UkoeHB> "One additional complication is..." <- That PoC (MRL #88) was for pure client-side + assuming unlock times exist. I think you're good to stick with output indices in a deterministic approach + if we assume unlock times are gone or hidden
-
jberman[m]
but, what we do today where we first gamma select an output from across the chain, then uniform random select an output from the block that gamma selected output is in.. that would be complicated to replicate I think
-
UkoeHB
jberman[m]: I think we could cut out the part about randomly selecting in the block (not sure why we do that).
-
moneromooo
IIRC, it was to counter some bias due to high variance in the number of outputs per block...
-
moneromooo
Ah, yes: the Miller et al paper had a time distribution. Time is block (if you squint just right). Output based will bias as average output count changes increases over time.
-
jberman[m]
-
moneromooo
That's not what I said above, and I think what I said above is way more significant... Hmm. Maybe I did not understnad what sarang said at the time...
-
UkoeHB
ah, if txs in a block are randomly ordered then the tx members don't directly map to a timing distribution, ok
-
jberman[m]
moneromooo: I think what you were saying above is rationale for why to use the "output lineup" method in general, rather than point to blocks
-
jberman[m]
and the comment in that PR is an explanation for why to do that single last step in the algo where it randomly selects an output from the block
-
jberman[m]
after gamma selecting an output from across the chain using the "output lineup" method
-
jberman[m]
Rucknium UkoeHB here's the presentation on that paper:
alishahc.com/slides/mixing_pres.pdf
-
UkoeHB
jberman[m]: nice find! google totally failed me
-
jberman[m]
found it on the author Alishah Chator's professional website haha