-
jberman[m]
Just got caught up with above. It was a mistake on my end to use a db read txn guard in `on_getblocktemplate` which I take responsibility for. In 7936/37, I specifically sought to only use a db read txn guards in RPC functions where it would guarantee thread safety (i.e. RPC functions which strictly read state from the db in its lower levels), which should have excluded `on_getblocktemplate` since `Blockchain::create_block_template` reads
-
jberman[m]
state from memory. I didn't grab the locks blindly. I was careful to try and place them as late as needed to achieve thread safe RPC calls, but made a mistake adding it to this function
-
jberman[m]
AFAICT the proposed fix doesn't guarantee a thread safe response to the RPC caller because the db can update in between these 2 lines:
github.com/monero-project/monero/bl…rpc/core_rpc_server.cpp#L1756-L1776
-
jberman[m]
Personally I just would have removed the db read txn guard from `on_getblocktemplate` and left it at that for now until a better locking approach was found. But I don't see an issue with adding a read txn guard in 8383; seems like it can only help achieve a higher level of thread safety in the response to `create_block_template`
-
jberman[m]
A larger structural change to enable thread safe RPC calls is probably warranted I think. But I don't think the argument to revert 7936/37 is particularly sensible given what's known at this point -- I don't see how the cost definitively outweighs the benefit, but I'm open to more thoughts on it
-
sech1
jberman[m] line 1776 gets blocks id for the next seed height, so if can change only if a reorg deep enough happened. The field itself changes only once every 2048 blocks. Yes, it can happen, but it's very rare. Also, next_seed_hash is not used by anything at the moment. It was supposed to be used to precompute the next RandomX dataset and all miner
-
sech1
implementation just update the current dataset when seed_hash changes.
-
sech1
Yes, it can return incosistent data under rare conditions (at most every 2048 blocks, and only if a reorg happens at this exact moment), but next_seed_height changes 64 blocks before it's actually used, so worst case when it's wrong, miner who uses it (no one does) will calculate RandomX dataset one more time
-
sech1
So I'd give it a very low priority to fix. But the fix would be moving it all inside create_block_template
-
sech1
and adding next_seed_height, next_seed_hash parameters to it
-
sech1
add a comment to the code there in 8383
-
sech1
*added a comment
-
ooo123ooo1234567
expected: notification -> rpc -> new data with correct order of events; before 7937: notification -> rpc -> new data, but with race conditions; after 7937: notification -> rpc -> old data, but without race conditions
-
ooo123ooo1234567
<jberman[m]> "Personally I just would have..." <- it isn't only about thread safety, many things at once are required
-
ooo123ooo1234567
<jberman[m]> "A larger structural change to..." <- 7936/7937 claimed to achieve it, but it didn't and it should be reverted and/or replaced with better one
-
ooo123ooo1234567
the problem is that negative side effect from these changes is visible only for those who tries to rely on this api
-
ooo123ooo1234567
in the worst case it would be like: emergency notification -> rpc -> everything is ok -> very important notification was ignored by external api user
-
ooo123ooo1234567
it isn't ok
-
ooo123ooo1234567
* api user due old data returned in response
-
ooo123ooo1234567
<sech1> "add a comment to the code..." <- "// TODO: move next_height, next_seed_hash under the protection of locks in Blockchain::create_block_template to guarantee consistent data" TODO: revert 7937, close 8383, optionally add another PR
-
jberman[m]
I do see that argument. It would be better to receive inconsistent data in a response on some occasions (for now), than to receive all stale data after notification that data has updated, which still seems possible in other RPC endpoints that have the db read txn guard aside from this one
-
jberman[m]
The former seems pretty bad too. Hard to rationalize which is "better". But I think a good argument can be made that it's best to leave behavior as it's been than to introduce a separate issue
-
ooo123ooo1234567
jberman[m]: it's possible to write PR that adds more methods within core in order to achieve both: correct order of events without race conditions
-
ooo123ooo1234567
I wouldn't approve 7936/7937/8383 during review, but it was approved and now can't be even reverted
-
jberman[m]
Ya I see that, that will just take a bit of time
-
ooo123ooo1234567
jberman[m]: no time - revert, enough time - revert and/or another PR
-
jberman[m]
I'll submit a PR that reverts the changes of 7937 on master (7936 was never merged). I don't see an issue with 8383 regardless, do you?
-
jberman[m]
ooo123ooo1234567: And a complete PR for this in the future would be nice
-
ooo123ooo1234567
jberman[m]: It's useless placebo: placing db lock after txpool and blockchain locks
-
ooo123ooo1234567
if there is some inconsistency within Blockchain class, then identify it and fix with internal locks before they are removed, but not add lmdb read txn blindly
-
ooo123ooo1234567
* internal locks (std::mutex, not lmdb read txn) before they
-
jberman[m]
Eh, ya fair enough. sech1 I think he makes good arguments
-
jberman[m]
worth reading in the logs
-
sech1
I only care about get_block_template working properly.
-
jberman[m]
reverting 7937 should do that. If you remove the db read txn guard entirely, it works as you expect, right?
-
sech1
yes
-
sech1
m_tx_pool lock protects it from running in parallel with reorg or a new block
-
moneromooo
BTW, if anyone didn't see it already, the comment near the top of prepare_handle_incoming_blocks is very relevant.
-
sech1
comment about the order of locks? BTW, do we ever run monerod under thread sanitizer?
-
moneromooo
Yes. And I did once and I was none the wiser.
-
moneromooo
Lots and lots of messages which didn't seem to be obviously wrong, so I decided to not waste my time with it.
-
moneromooo
Someone who's familiar with tsan may have more luck working with it on monero.
-
sech1
I managed to make sense of tsan messages in p2pool (with a bit of filtering), and fixed them
-
sech1
But I'm scared of millions of warnings it will spit out when tested in monerod :D
-
MeowingCat
wallet2 is starting to get blocks only when i do wallet->createTransaction()
-
MeowingCat
what does this mean??
-
ooo123ooo1234567
<moneromooo> "Someone who's familiar with tsan..." <-
uwsg.indiana.edu/hypermail/linux/kernel/0106.2/0405.html, "A Computer is a state machine. Threads are for people who can't program state machines." still correct: taking one more dependency into consideration vs just use one more thread (works until problems with synchronization) vs just use new more memory (works until problems with out-of-memory), not correct:
-
ooo123ooo1234567
welcome vulnerabilities
-
MeowingCat
ohh i found wallet->startRefresh()
-
MeowingCat
can anyone say me does wallet2 saves current state of blockchain synchronization?????
-
MeowingCat
im doing wallet->store() and it is storing something in a .key file buttt i need to know does it save current state???
-
MeowingCat
[1655126920] libunbound[41972:0] error: can't bind socket: Permission denied. for 0.0.0.0
-
MeowingCat
i got this while sending transaction
-
MeowingCat
this doesn't sound gooood if wallet2 needs to bind sockets for mobile networks
-
MeowingCat
because damn mobile networks are working behind damn NATs
-
hyc
note that LMDB read txns do nothing for thread safety. LMDB reads are lockless and never block.
-
hyc
a read txn's only purpose is for data consistency: all reads performed within the same read txn see data consistent with the moment the read txn began.
-
moneromooo
You can disable dns for the wallet. It's really unused now, it was a failsafe at a time it was needed.
-
hyc
if you had an unsafe lock ordering before, sprinkling LMDB readtxns into the code will not affect it, it will still be unsafe.
-
hyc
you could also consider using valgrind/helgrind to look for data races and such. of course valgrind runs like 100x slower than realtime
-
ooo123ooo1234567
<jberman[m]> "I'll submit a PR that reverts..." <- `git revert refs/pull/7937/head` no conflicts, pretty straightforward
-
UkoeHB
hyc: if a read txn is started while a different read is still active, will the second one have a newer view on the db?
-
ooo123ooo1234567
UkoeHB: yes, since events are: read txn -> write txn -> read txn
-
ooo123ooo1234567
otherwise db isn't consistent
-
hyc
a readtxn has the view of the DB that was in effect when the readtxn began
-
hyc
it doesn't matter what any other readtxn is present
-
hyc
or doing or not
-
ooo123ooo1234567
s/refs/pull/7937/head/50410d1f7d04bf60053f2263410c39e81d3ddad1/
-
hyc
as far as consistency goes, there's nothing wrong with readtxn -> writetxn [readtxn continues to return old data]
-
hyc
from a correctness standpoint, every op within the readtxn occurred before the writetxn ever happened.
-
ooo123ooo1234567
s/refs/pull/7937/head/-m 1 50410d1f7d04bf60053f2263410c39e81d3ddad1/
-
UkoeHB
I see, interesting
-
hyc
the only thing that matters as far as clinet-visible correctness is that any single request from a client invokes only a single DB txn, total.
-
hyc
if you're opening and closing many DB txns to process a single client request, then all consistency is lost
-
ooo123ooo1234567
teddit.net/r/Monero/comments/vb7fmh…_is_the_time_to_decentralize_monero, right place to discuss changes in monero without reading code and critical thinking; the most upvoted comment suggests to use private key signing to force p2pool; facepalm
-
ooo123ooo1234567
monero-project/monero #8384, approval msg doesn't contain any conclusion about PR correctness and revert should be merge commit 50410d1f7d04bf60053f2263410c39e81d3ddad1, since whole PR isn't ok
-
ooo123ooo1234567
* should be for merge commit
-
ooo123ooo1234567
"A stronger long term fix to ensure consistent db reads for RPC functions that will still return the latest data is preferred. See one such proposed solution here." the purpose is to have consistent api (rpc / zmq / ... / notifications), not just db reads, since currently db doesn't store full state of daemon
-
ooo123ooo1234567
actually understanding of the right goal would help to not do that mistake
-
jberman[m]
sure, updated
-
ooo123ooo1234567
jberman[m]: wow
-
jberman[m]
what
-
ooo123ooo1234567
jberman[m]: I mean unexpected, no questions
-
ooo123ooo1234567
monero-project/monero #8384#issuecomment-1154219150, conclusion should be placed into approval; if in doubt what to choose for conclusion, here are some suggestions: correct / may be correct / might be correct / certainly improvement / useful change / better than nothing / worse than nothing
-
ooo123ooo1234567
In theory, there could 1 million of random comments, but the ones in approval should be non-empty and not random
-
jeffro256[m]
I wasn't specifically approving though, just confirming that the PR does what it says it does
-
jeffro256[m]
Because deciding whether reverting 7937 is the best course of action is entirely a different issue
-
ooo123ooo1234567
<jeffro256[m]> "Because deciding whether..." <- then such comment isn't very useful since it doesn't take any responsibility
-
ooo123ooo1234567
In theory, sech1 could approve it and it would be enough to add it into merge queue
-
ooo123ooo1234567
<jeffro256[m]> "Because deciding whether..." <- how much time would it require to make this decision ? any arguments against it ?
-
selsta
there is already support to revert it from people involved
-
ooo123ooo1234567
s/it/be/, s/require/needed/
-
hyc
so we know for certain that 7937 is worse than the existing behavior?
-
ooo123ooo1234567
selsta: it's about review process in general, not about that specific commit. Just a study case for problems with bad review process
-
selsta
but if anyone has arguments against it they should be added to the PR, or here in the chat
-
jeffro256[m]
> > <@jeffro256:monero.social> Because deciding whether reverting 7937 is the best course of action is entirely a different issue
-
jeffro256[m]
>
-
jeffro256[m]
> then such comment isn't very useful since it doesn't take any responsibility
-
jeffro256[m]
Okay? I wasn't meaning to take responsibility, just stating that the PR reverts 7937. I'm also not opposing it, because I'm still looking at it
-
jeffro256[m]
It's useful to have multiple independent parties review PRs, even seemingly straightforward ones such as these
-
ooo123ooo1234567
jeffro256[m]: straightforward - approve it (with conclusion), not straightforward - add some arguments against
-
ooo123ooo1234567
<jeffro256[m]> "It's useful to have multiple..." <- many partial reviews can't replace 1 end-to-end, especially in cryptocurrency project where all details are important
-
jeffro256[m]
The changes themselves are straightforward (reverting 7937), the implications and long-term goal is less so (keeping it vs not keeping it, working on preserving order + integrity with regards to data, approving/introducing competing changes, etc). I was stating the fact the the changes themselves are correct (i.e. that PR 8384 reverts PR 7937) but was not yet making any statement as whether or not I generally approve of the concept
-
jeffro256[m]
(hint: I do thus far but I am taking a slightly longer look at it). Reviews often take place over multiple steps. I never said I was done reviewing but you seemed determined to get me to push a "end-to-end" review through as fast as possible, which is usually not the correct course of action
-
ooo123ooo1234567
jeffro256[m]: try to disconnect personality from technical problem; I've just noticed the message that repeats conclusion of another approval and suggested to put it into approval.
-
jeffro256[m]
"... and suggested to put it into approval." That's the thing: ***I don't approve yet***
-
jeffro256[m]
***yet***. That comment was meant to be taking at face value: it reverts the PR, nothing more, nothing less
-
rbrunner
I wonder how to interpret your remark "I second that this PR correctly reverts #7937" but no approval from you on that PR right now differently.
-
jeffro256[m]
Thank you. I did rephrase it to hopefully make it more clear
-
jeffro256[m]
-
jeffro256[m]
Would anyone be willing to run this PR and see if you also get the same deadlock problem?
-
MeowingCat
when i pass a filename to path in Monero::WalletManager::recoveryWallet(path, ...) public address is being a different and wrong address with 111111111111s
-
MeowingCat
when i pass empty string it is true
-
MeowingCat
as wallet path
-
selsta