#monero-research-lab

20:46

br-m

<kayabanerve:matrix.org> tevador: Your encouragement of a Crandall prime enables a batched point decompression due to a batch sqrt algorithm existing.
20:47

br-m

<kayabanerve:matrix.org> That's quite pleasant to discover :)
20:47

br-m

<kayabanerve:matrix.org> I haven't seen it prior discussed in literature, so I'm unsure if it's novel.
20:50

tevador

It's the first time I hear about batched sqrt. I only knew about batched inversion.
20:51

tevador

Is there a peper?
20:51

tevador

paper*
20:58

br-m

<kayabanerve:matrix.org> No, I just did it
20:58

br-m

<kayabanerve:matrix.org> Basically, instead of adding to the modulus, you subtract from the modulus.
20:58

br-m

<kayabanerve:matrix.org> 256 squares + as many muls as bits set -> 256 squares + as many muls as bits not set
20:58

br-m

<syntheticbird> @kayabanerve:matrix.org: weird way to flex but ok
20:58

br-m

<kayabanerve:matrix.org> But with a Crandall prime, 192 bits are set, and 64 aren't, so the latter is much cheaper
20:59

br-m

<kayabanerve:matrix.org> @syntheticbird:monero.social: I was hoping tevador would have an idea of someone else who already did this 😅
21:00

br-m

<kayabanerve:matrix.org> You do need a multiplicative inverse to 'subtract' from the *exponent* (sorry, not "modulus", I meant to say instead of adding up to the exponent, you subtract down to it), yet inverses are famously batchable.
21:01

br-m

<kayabanerve:matrix.org> Helios and Selene of course have 128 bits set and 128 with 50% odds each, Ed25519 is almost entirely set making this ~50% faster to decompress Ed25519 points AFAICT. You basically have the squares, two extra squares, four multiplications, and the amortized batch inverse.
21:02

br-m

<kayabanerve:matrix.org> github.com/kayabaNerve/monero-oxide…rc%2Fhash_to_point%2Fmod.rs#L55-L98 for a very bad implementation. I'm trying to do something stupid and just producing a lot of legitimate research on the way as a side effect.
21:03

br-m

<kayabanerve:matrix.org> (so please forgive the code)
21:07

br-m

<kayabanerve:matrix.org> *my claim of four multiplications is due to the three for the batch inverse, and one to perform the 'subtraction'. One extra square is necessary to overshoot the exponent, allowing us to subtract from it. The second extra square is nonexistent, as while we do need a square for the unset bit and value to invert, it's folded into the existing squares and isn't additional.
21:08

br-m

<kayabanerve:matrix.org> That only gives you a square root candidate though, and you still need to check if its valid, and if not, move to the other for primes of the form 5 mod 8, as seen here with Ed25519 (costs not modelled in my above statement as the point is the improvement to the exponentiation step).
21:17

br-m

<kayabanerve:matrix.org> Tbf, it's really any prime with a significant skew in the distribution of how many bits are set which so benefit.
21:17

br-m

<kayabanerve:matrix.org> I just noted that while primes with few bits sets are faster inherently, now primes with most bits set are also so faster _in a batch setting_.
21:58

tevador

kayabanerve: It only saves about 5 multiplications. Still might be worth it, though. gist.github.com/tevador/e5491b90a47f316e75b6d9086ae57c6f
21:58

tevador

output from this tool: github.com/mmcloughlin/addchain
22:00

br-m

<kayabanerve:matrix.org> Ah. I wasn't comparing to an addition chain :/ Do you happen to know what chains for Helios/Selene would be an if those would still benefit from such a technique?
22:00

br-m

<kayabanerve:matrix.org> (Apologies for not just running the tool myself, I stepped away from my computer)
22:01

br-m

<kayabanerve:matrix.org> Tbf, the existing Helios and Selene code uses this premise for the top 128 bits, so I should've been aware.
22:02

br-m

<kayabanerve:matrix.org> And an addition chain is probably sufficient to capture the optimizable value in this structure...
22:05

DataHoarder

in make_carrot_uncontextualized_shared_key_sender given that the following conditions apply, address_view_pubkey has been checked (is in prime order subgroup),
22:05

DataHoarder

and enote_ephemeral_privkey is produced mod l (as make_carrot_enote_ephemeral_privkey does, // k_e = (H_64(anchor_norm, input_context, K^j_s, pid)) mod l)
22:05

br-m

<kayabanerve:matrix.org> The one distinction is the addition chain optimizes for repeated patterns of bits, while this optimizes for a skew in the distribution? So these are doing two separate things, even though there's a correlation between a skewed distribution and repeated patterns.
22:05

DataHoarder

instead of doing s_sr = d_e ConvertPointE(K^j_v) (X25519 unclamped scalar mul after conversion)
22:05

DataHoarder

would doing s_sr = ConvertPointE(d_e * K^j_v) be equivalent?
22:08

DataHoarder

the latter is vastly faster on current implementations, given I use a vartime edwards25519 scalar mul there
22:10

tevador

This one is for Selene: gist.githubusercontent.com/tevador/…08dc7443affbff8/addchain_selene.txt
22:13

tevador

DataHoarder: you shouldn't use vartime scalar multiplication with a secret exponent.
22:13

DataHoarder

the exponent is public in this case
22:14

DataHoarder

(I'm making specializations for P2Pool)
22:14

tevador

In that case, yes, those should give equivalent results.
22:15

br-m

<kayabanerve:matrix.org> Ouch. I think my poke at Selene was much worse than that chain. Such a shame :/
22:15

DataHoarder

alright, thanks. I have it tested but wanted some confirmation (will add notes under it)
22:16

tevador

Btw, for p2pool, is there a reason to use a scalar other than 1?
22:17

tevador

At least for Carrot, the context ensures the keys will always be unique.
22:17

DataHoarder

this scalar is generated from make_carrot_enote_ephemeral_privkey which is a hash
22:17

DataHoarder

which gets verified later on when scanning, afaik
22:18

DataHoarder

yep, gets recreated under verify_carrot_normal_janus_protection
22:19

tevador

Janus protection should only be needed when mining to a subaddress, which is not supported?
22:20

DataHoarder

it gets checked regardless
22:21

DataHoarder

it'd be great if we could mine to subaddresses (given the additional pubkeys need to be included already) but AFAIK that could not get properly protected as amount was not encrypted
22:21

tevador

Btw, for Jamtis, P2Pool will probably need to fake the CSIDH key because it would be too slow to generate >100 shared secrets per block.
22:22

DataHoarder

tbh this is not the expensive part for P2Pool, the get_output_proposal_parts is because it depends on amount which is random
22:22

tevador

Luckily, if you set the CSIDH pubkey to all zeroes, the shared secret is equal to the address key.
22:22

DataHoarder

on p2pool we are able to calculate quite a bit of things ahead of time
22:22

DataHoarder

well if you want jamtis feedback I can look at implementing whatever you have :D
22:23

tevador

With 700 miner outputs, just CSIDH would take ~20 seconds.
22:23

DataHoarder

it's worse as well tevador, as whatever needs to be generated for one block needs to be recalculated every 5-10s in P2Pool
22:24

DataHoarder

with FCMP++/carrot we can cache up to get_output_proposal_parts, which then cannot due to it depending on amount (even while it uses a fixed scalar 1 for blinding key)
22:24

tevador

Correction: CSIDH-1024 would take ~2 minutes.
22:25

DataHoarder

maybe we could have CSIDH-pow :)
22:25

br-m

<kayabanerve:matrix.org> Is that not why we have threads? /s
22:26

br-m

<kayabanerve:matrix.org> DataHoarder: The proof of correctness would be nontrivial.
22:27

tevador

Actually, there is an isogeny based hash function.
22:28

br-m

<kayabanerve:matrix.org> ... but PoW in the form of submitting CSIDH ephemeral keys for real-world use?
22:28

br-m

<kayabanerve:matrix.org> (as to avoid shimming with zeroes)
22:29

DataHoarder

at least for p2pool we can do the torsion checks plus everything else ahead of time, when we receive the address pubkeys in the sidechain
22:30

DataHoarder

then it's only good data into the PPLNS window, so no more expensive checks need to be done
22:31

DataHoarder

generating the outputs is however delaying share verification a lot before broadcast, so we might opt for delaying verification of that (and only verifying addresses/rewards), then broadcast, then verify (but we can't ban on this)
22:31

DataHoarder

PoW is done previous to any of these steps - so it's not a free DoS even when they send bad outputs
22:35

tevador

The zero pubkey would only be allowed for coinbase transactions. We don't want lazy wallet developers to nullify post quantum security of the protocol.
22:35

DataHoarder

scanning coinbase is already a special case, I guess
22:36

br-m

<kayabanerve:matrix.org> That'd screw with Serai, who may also poke at that if truly so computationally heavy.
22:36

tevador

^ lazy developer spotted
22:36

DataHoarder

for carrot is good enough, besides as mentioned, the amount :)
22:37

br-m

<kayabanerve:matrix.org> But my point was, defining PoW as performing ephemeral key generation would presumably require a proof the key was correctly generated, which is presumably non-trivial.
22:37

br-m

<kayabanerve:matrix.org> No, because I suggested using threading :p
22:37

br-m

<kayabanerve:matrix.org> But if we did have an enormous amount of TXs requested, I'd consider deferring to 0'd ephemeral keys if the backlog exceeds some time delay.
22:37

DataHoarder

which changes the one time address calculation, it's three scalar mult + three point additions
22:38

DataHoarder

per output, so 700* that on an RPi 5 ends up at around 14ms if I remember
22:38

br-m

<kayabanerve:matrix.org> The only reason for serai to not use zero is so such TXs don't stick out to those not already aware, which is a goal we have
22:39

tevador

The CGL hash simply hashes the data to generate the secret isogeny. So it is possible to verify it.
22:40

tevador

And the bits select the isogeny path, for example -1/+1 depending on the bit.
22:40

DataHoarder

tbh it wouldn't be "that" bad if there was a faster way to verify the outputs were calculated properly, but afaik the best way is to ... recalculate them, lacking key information other than anchor randomness and receiver pubkeys
22:41

tevador

For p2pool, the only viable way is the zero key. Or having 64 CPU cores :)
22:42

DataHoarder

yeah I was looking at numbers "hey that's not that bad"
22:42

DataHoarder

then ofc I am running tests on 9900X3D
22:44

tevador

For #151, the remaining issue to discuss is if adding 13 ms to transactions verification time is OK. The rest seems to be clearly in favor of CSIDH-1024.
23:07

DataHoarder

but if for coinbase 0'd ephemeral keys (or single shared one?) are acceptable, that'd more or less bring p2pool to exactly the point is at nowadays, except the cpu time and template switch delay is mostly unchanged (ephemeral keys are cacheable per block, one time address are not due to inclusion of amount)

5 months ago

« a day earlier

a day later »

today »