#monero-research-lab

15:03

UkoeHB

meeting 2hr
16:17

baro77[m]

Sorry everybody, I'm a rookie and maybe with this post I'm making more noise than necessary, but I wanted to signal this (just seen):
16:17

baro77[m]

blog.trailofbits.com/2022/04/13/par…ting-girault-bulletproofs-and-plonk
16:17

baro77[m]

(btw, cross-posting to monero-dev)
16:24

baro77[m]

correction: not cross-posted, 'cause I have seen @jrg was 1 hour ahead of me ;)
16:26

endogenic

beat me to it as well though
16:33

endogenic

I hear that it doesn't affect Monero though
16:33

baro77[m]

😅
16:58

cheekyleeks

baro77[m]: nttr.stream/benediktbuenz/status/1514270574557769730
16:58

cheekyleeks

Luckily Monero/Ristretto/zkcrypto/libsecp256k1 had it right from the beginning and were not affected. This shows how error-prone implementing Fiat-Shamir is and it's best to use a library like Merlin github.com/zkcrypto/merlin
16:59

baro77[m]

👍 I showed my rookie-ness ;)
17:00

UkoeHB

meeting time monero-project/meta #688
17:00

UkoeHB

1. greetings
17:00

UkoeHB

hello
17:00

endogenic

👍
17:00

Rucknium[m]

Hi
17:00

dangerousfreedom

Hello
17:01

xmr-ack[m]

hey
17:02

jberman[m]

howdy
17:02

UkoeHB

2. in case anyone missed it, we can get this out of the way blog.trailofbits.com/2022/04/13/par…ting-girault-bulletproofs-and-plonk
17:03

UkoeHB

short story short: does not affect monero
17:03

UkoeHB

any questions on that disclosure?
17:04

xmr-ack[m]

Not being as familiar with bulletproofs, why exactly does it not affect Monero?
17:05

moneromooo

Monero's was written by Sarang, who was careful.
17:08

Rucknium[m]

The blog post claims:
17:08

Rucknium[m]

>Why is this type of vulnerability so widespread? It really comes down to a combination of ambiguous descriptions in academic papers and a general lack of guidance around these protocols.
17:08

Rucknium[m]

Is it true that there is this shortcoming in the cryptography space?
17:09

UkoeHB

xmr-ack[m]: The issue was the paper did not have a complete implementation guideline, so anyone going directly off the paper would have made a certain small error in the Fiat-Shamir transform (neglecting to hash some public variables). Monero's implementation followed best practices for FS transforms, which prevented that issue.
17:11

UkoeHB

Rucknium[m]: yes, best practices for implementing cryptographic protocols are not well-disseminated among general developers
17:13

UkoeHB

Also, afaik most crypto professionals aren't implementing advanced algorithms, they are dealing with data encryption and encrypted channels. It's mainly in the cryptocurrency and experimental crypto spaces that these big protocols are implemented.
17:14

dEBRUYNE

UkoeHB: Does it list the projects that are affected somewhere?
17:14

UkoeHB

yes it's in the blog post
17:15

UkoeHB

Most crypto professionals aren't rolling their own crypto from scratch *. They are following some highly-scrutizined specification.
17:16

UkoeHB

Anyway, we can move on
17:16

UkoeHB

3. updates, what is everyone working on? monerotopia was last week, it was great to meet several people
17:18

jberman[m]

Aside from monerotopia which was sweet, been working on PR review (my list atm: 8046 [done], 7760 [in progress], 8076 [in progress]), digging into a sporadic bug with tx broadcast (8251, 6929), and submitted an updated proposal for subaddresses in light wallet servers in collab with endogenic here: monero-project/meta #647
17:19

UkoeHB

me: I'm continuing work on my seraphis library. I realized getting an accurate minimum fee oracle will be a huge pita, so I'll probably just make a trivial mockup and move on...
17:19

Rucknium[m]

Working with jberman and mj-xmr on translating some C++ to statistics language. And doing some prototyping of statistical models with some simulated data.
17:21

dangerousfreedom

Im writing some code and text to understand and explain mlsag and borromean ring signatures. I think I will have something to present by the end of the month.
17:22

UkoeHB

3. discussion: any topics to discuss? questions/comments?
17:23

dEBRUYNE

UkoeHB: Thanks
17:23

xmr-ack[m]

Me: Unfortunately, I couldn't attend Monerotopia :(. But I do have some updates in regards to my work with MAGIC. I have a working implementation of two rudimentary machine learning models and have been training them on a preliminary dataset. I want to clarify that the preliminary dataset has many flaws... (full message at libera.ems.host/_matrix/media/r0/do…6186c27f11139c90f09926efa9e71009791)
17:24

xmr-ack[m]

* Me: Unfortunately, I couldn't attend Monerotopia :(. But I do have some updates in regards to my work with MAGIC. I have a working implementation of two rudimentary machine learning models and have been training them on a preliminary dataset. I want to clarify that the preliminary dataset has many flaws and that most of the transactions were collected with an (unrealistic) 20-minute uniform delay. I am currently refining the
17:24

xmr-ack[m]

process of collecting a new dataset that does reflect real user spending patterns.
17:24

xmr-ack[m]

That being said, after training on an undersampled dataset removing the possibility of a guess newest heuristic, I am seeing accuracies upwards of three times random guessing classifying the true spend of an arbitrary ring with no outside information.
17:24

xmr-ack[m]

Metrics:
17:24

xmr-ack[m]

Random Forest Testing Accuracy: 23.41%
17:24

xmr-ack[m]

Random Forest Weighted F-1 Score: 19%
17:24

xmr-ack[m]

Gradient Boosted Classifier Testing Accuracy: 27.40%
17:24

xmr-ack[m]

Gradient Boosted Classifier Weighted F-1 Score: 26%
17:24

xmr-ack[m]

Feature Importance (Top 5):
17:24

xmr-ack[m]

Inputs.1.Time_Delta_From_Newest_Ring_To_Block: 13.07%
17:24

xmr-ack[m]

Total_Block_Tx_Fees: 10.97%
17:24

xmr-ack[m]

Inputs.0.Time_Deltas_Between_Ring_Members.9_10: 4.98%
17:25

» xmr-ack[m] uploaded an image: (112KiB) <libera.ems.host/_matrix/media/r0/do…/dqAGJKuAviBOFlLETUwqOVcZ/image.png>
17:27

Rucknium[m]

My commentary (I've seen these results prior to now): This underlines again that when there is a substantial mismatch between the decoy selection algorithm and the real spend age distribution, statistical methods can be used to narrow down which ring member is the real spend.
17:28

Rucknium[m]

And here there was a large mismatch, since the "real user behavior" in this synthetic data was uniform over just 20 minutes.
17:28

UkoeHB

Thanks xmr-ack[m]
17:30

Rucknium[m]

Also the feature importance is interesting, but it is hard to know if there is any real problem with some of these features without some sort of confidence interval or p-value on these estimated feature importance results. Since the feature importance may be nonzero just due to sampling error.
17:31

UkoeHB

On a different topic, I want to point out that Monero multisig is capped at 16 group members for efficiency reasons during account generation. I was assuming people who want to use multisig would see that limit in the code, but maybe not... (looking at Thorchain). A different approach to key generation (see the FROST paper) is much more efficient when M < N - 1 (enabling larger groups generically), but my focus was on
17:31

UkoeHB

N-of-N and (N-1)-of-N for efficient escrow. I will probably implement a FROSTy key generation backend for multisig in my seraphis branch (but probably won't port that to master, since current multisig uses round-robin signing but you need aggregation signing for FROST - which I have in my seraphis branch).
17:32

Rucknium[m]

In other words, if there is some feature that the machine learning algorithm identifies as helping classify the real spend, then it would make sense to investigate further and see if there is some sort of problem with the code or the protocol.
17:33

UkoeHB

Rucknium[m]: I think most features that can affect distinguishability can be manually identified, although it would be helpful to get a sense of their significance in practice.
17:35

UkoeHB

isthmus and neptune have done a bunch of research into distinguishable features (numerical counts of different fingerprints), but did not extend that to transaction tracing
17:35

moneromooo

One problem is that ground truth is kinda biased due to self selection (I forget the exact term).
17:36

moneromooo

ie, the rings you know the real spend for will not be randomly selected from the set of all rings.
17:36

moneromooo

So using this as a proxy for ground truth will bias you.
17:36

moneromooo

I think. Not a statistician. etc.
17:37

UkoeHB

makes sense to me
17:38

mj-xmr[m]

The more I think we need to look for the ground truth in what the code generates and simulate the situations, rather than only relying on past data.
17:39

mj-xmr[m]

(in search of ground truth, I mean)
17:39

UkoeHB

I want to implement binning in seraphis this week. There are two choices: A) completely deterministic binning (selection distribution is baked into consensus), B) partially deterministic binning (only bin members are deterministically distributed around their bin locus, but each bin's locus is manually specified).
17:39

jberman[m]

seems a +1 for binning to me too, because the gamma AFAIU is pulled from just a subset of rings in which the real could be identified, rather than all rings. binning mitigates weakness in an incorrect gamma
17:42

jberman[m]

reasoning for B was to avoid tying the algo to consensus to allow changes without a hard fork right?
17:42

Rucknium[m]

UkoeHB: To go back to the manual identification vs. machine learning identification: the 5th most important feature, according to these preliminary results, is transaction size. I don't know of any obvious issue with transaction size that would help identify the real spend, so this ML approach can extend our knowledge of transaction uniformity defects beyond what was previously theorized by isthmus, neptune, and others.
17:42

UkoeHB

The advantages of (A): less data in a tx (maybe 30-80 bytes), less room for implementation variance that can open anonymity puddles.
17:42

UkoeHB

The advantages of (B): we aren't stuck with a fixed selection distribution, it's a lot easier to implement.
17:43

moneromooo

I can think of one reason tx size identifies the real spend:
17:44

UkoeHB

tx size can be decomposed to input/output count + tx extra fields.
17:44

moneromooo

A common reason for large txes is many input txes, which are typically made to consolidate small outputs, like dust etc. These are usually old and you consolidate once you have to after spending the rest.
17:44

moneromooo

Another is pool payments, paying to many people. High churn so typically new outputs.
17:45

moneromooo

First case will have two outputs, first ring member. Second case, more than two outs, last or close to last ring member.
17:46

» xmr-ack[m] sent a code block: libera.ems.host/_matrix/media/r0/do…0d3faf278d7169c019a2d3614221d8e5a9b
17:46

moneromooo

s/many intput txes/many inputs/
17:46

Rucknium[m]

UkoeHB: How does deterministic ring member selection affect tx verification time? Also, I have been trying to understand your deterministic ring selection proposal on github, but I feel that I'm missing something. Is there some cryptographic knowledge I need to understand it?
17:46

moneromooo

BTW, if you write too long lines, your message gets omitted, and replaced by a url.
17:46

UkoeHB

there's no cryptography other than a hash function
17:48

UkoeHB

it shouldn't have a noticeable effect on tx verification time
17:50

Rucknium[m]

Last time we talked about enforcement of decoy selection, there was enthusiasm for enforcement at the node level rather than the consensus level. Could Seraphis's deterministic decoy selection be written in a way that `monerod` checks it in the mempool, but does not declare blocks invalid if there are txs mined that do not conform?
17:51

UkoeHB

You could do that, but I feel it is a bit of scope creep for nodes
17:52

jberman[m]

<Rucknium[m]> "UkoeHB: How does deterministic..." <- The mapping probability to cdf to output index was the most difficult part understanding it for me. I wrote up some code that made it click for me a while back that might help, will share once I find it Rucknium
17:53

jberman[m]

This paper is super helpful: spar.isi.jhu.edu/~mgreen/mixing.pdf , and a presentation on it was given that helped me too that I can't find right now
17:54

Rucknium[m]

jberman: Mainly I don't understand how the real spend is made indistinguishable from the deterministically-selected decoys. Trying to understand that.
17:56

UkoeHB

jberman[m]: damn I lost that presentation...
17:57

moneromooo

The obvious method is to include an offset, selected to that one of the deterministic picks matches the real spend.
17:58

moneromooo

selected so that...
17:58

UkoeHB

Rucknium[m]: you generate a random set, then shift that set so a random member lands on the real spend
18:00

Rucknium[m]

Ah ok. Great. Might that give you gaps in the beginning or end of the distribution, though?
18:00

UkoeHB

why?
18:00

moneromooo

The offset will tend to be large for old spends.
18:01

Rucknium[m]

And it seems more feasible with Seraphis-sized ring member counts rather than 11 or 16
18:01

UkoeHB

you transform the random set into a uniform space before rotating, then transform back into the real selection space
18:01

moneromooo

Though I suppose that, depending on the generation algorithm, you might be able to brute force one with an old spend and small offset...
18:01

UkoeHB

we are at the end of the meeting, thanks for attending everyone
18:04

jberman[m]

High level summary of what I did in my code sample that explains how it can mesh with the gamma: assume 16 ring members. Use a random seed to generate 16 real numbers between 0-100. Those values are the starting spots in the gamma CDF that we will offset in a later step. Take the real output, map that into its spot in the CDF. Take 1 spot from the original 16, offset it to match the real's CDF, and then offset all other spots by
18:04

jberman[m]

the same amount. Then all you need to store is the offset and the random seed, and you can deterministically find all 16 ring members
18:04

jberman[m]

there are floating point calculations tho, which I don't know how to avoid
18:06

moneromooo

Maybe boost multiprecision stuff ? 256 bit numbers. Can be used for fixed point.
18:07

moneromooo

And you don't care about imprecision, as long as it's deterministic.
18:13

jberman[m]

seems reasonable to me. also need to make sure that every output in the chain can plausibly be referenced as a decoy, just gotta keep that in mind/double check when implementing
18:16

jberman[m]

i don't know if my code sample guaranteed that/what bit precision in the real might be necessary to guarantee that
18:16

UkoeHB

jberman[m]: I'll try to implement the fully deterministic version, and if it seems too difficult to get right I'll just give up and go with partially deterministic.
18:17

UkoeHB

One additional complication is you want a gamma over block heights, rather than over output indices..
18:18

moneromooo

I strongly suspect that if imprecision means some outputs can't be selected, changing height will change the selectable part and yo'll get full coverage. Though you do bring up an excellent point that it does alter the probability distribution...
18:24

jberman[m]

<UkoeHB> "One additional complication is..." <- That PoC (MRL #88) was for pure client-side + assuming unlock times exist. I think you're good to stick with output indices in a deterministic approach + if we assume unlock times are gone or hidden
18:26

jberman[m]

but, what we do today where we first gamma select an output from across the chain, then uniform random select an output from the block that gamma selected output is in.. that would be complicated to replicate I think
18:30

UkoeHB

jberman[m]: I think we could cut out the part about randomly selecting in the block (not sure why we do that).
18:31

moneromooo

IIRC, it was to counter some bias due to high variance in the number of outputs per block...
18:32

moneromooo

Ah, yes: the Miller et al paper had a time distribution. Time is block (if you squint just right). Output based will bias as average output count changes increases over time.
18:35

jberman[m]

monero-project/monero #5389#discussion_r271668970
18:38

moneromooo

That's not what I said above, and I think what I said above is way more significant... Hmm. Maybe I did not understnad what sarang said at the time...
18:40

UkoeHB

ah, if txs in a block are randomly ordered then the tx members don't directly map to a timing distribution, ok
18:43

jberman[m]

moneromooo: I think what you were saying above is rationale for why to use the "output lineup" method in general, rather than point to blocks
18:43

jberman[m]

and the comment in that PR is an explanation for why to do that single last step in the algo where it randomly selects an output from the block
18:44

jberman[m]

after gamma selecting an output from across the chain using the "output lineup" method
18:52

jberman[m]

Rucknium UkoeHB here's the presentation on that paper: alishahc.com/slides/mixing_pres.pdf
18:54

UkoeHB

jberman[m]: nice find! google totally failed me
18:55

jberman[m]

found it on the author Alishah Chator's professional website haha

4 years ago

« 2 days earlier

a day later »

today »