-
sech1
whoever is running p2pool at 142.202.205.118, check your monerod, it's probably stuck
-
sech1
memory leak check: yesterday it was 3469620/37780/10096 (VIRT/RES/SHR), today 3470972/52032/10096
-
sech1
hyc I had the same problem with zmq_reader - it doesn't wake up until it gets a message, so I send a fake message on shutdown
-
sech1
Node running at 142.202.205.118 is constantly 1 Monero block behind for whatever reason, so other nodes reject its p2pool blocks. We have a lot of orphans from that node because of that. But it's all orphans from that node
-
sech1
not constantly, but it's syncing with Monero chain so slowly, literally 1-2 minutes later than other nodes
-
Inge
would e.g. it beign an rPi with software AES possibly explain such lag?
-
sech1
might be
-
sech1
my node has 0 orphans, so it's not my problem :D
-
sech1
rPi with slow HDD probably :D
-
sech1
I need to think about banning such nodes still. My node sends full unpruned blocks to it every time because it's so far behind. This is wasteful for the bandwidth.
-
sech1
that lagging node is not an rPi, it's crowncloud VPS
-
sech1
probably something like 2 vCPUs and 1 GB RAM
-
sech1
and they do have HDD servers there, not only SSD
-
sech1
31 orphans, and it's only those that came from non-lagging Monero blocks:
paste.debian.net/hidden/11ddf392
-
sech1
I bet that node has thousands of orphans
-
sech1
Even my shares have 3% uncles, it was 1% yesterday. Lagging nodes are bad
-
sech1
But it's like the worst possible condition because p2pool runs at 1 second block time and that node is literally on the other side of the globe, relative to mine
-
sech1
Uncle blocks are worth 20% less with default settings
-
CzarekNakamoto[m
<sech1> "I need to think about banning..." <- Banning like just for a hour or something only for you or globally? Also does p2pool require full blockchain to be downloaded or it can be pruned blockchain/public node? Also isn't 1 second too small? If somebody have ping larger than 800ms to another note it will create an insane lag (and if I understand that correctly, it can cause a softwork)
-
CzarekNakamoto[m
> <@sech1:libera.chat> I need to think about banning such nodes still. My node sends full unpruned blocks to it every time because it's so far behind. This is wasteful for the bandwidth.
-
CzarekNakamoto[m
* Banning like just for a hour or something only for you or globally? Also does p2pool require full blockchain to be downloaded or it can be pruned blockchain/public node? Also isn't 1 second too small? If somebody have ping larger than 800ms to another note it will create an insane lag (and if I understand that correctly, it can cause a softfork)
-
sech1
Please don't edit messages, it spam IRC side
-
sech1
Banning locally. We run with 1 second block time to run a stress-test
-
sech1
Pruned Monero node is ok
-
sech1
High ping by itself shouldn't be a problem, uncle blocks fix this in 99% cases. But that node is probably a low-tier VPS running on an HDD. Monero is already heavy on HDD, and if some neighbour on that VPS does something heavy too, it will lag - and it does
-
sech1
Other p2pool nodes don't tolerate when someone sends p2pool block build on top of an old Monero block because it useless for Monero side. So they just ignore such blocks and the node that sent them will get orphans
-
sech1
But they don't ban such node (yet). I need to think of when to ban these nodes without making too many false positives
-
CzarekNakamoto[m
Okay, I think that I get it now, thanks sech1!
-
hyc
sech1: uv_async_send() is the fake message to wake up the event loop
-
sech1
hyc I also submitted my block cache stuff and restarted p2pool node at 148.251.81.38 using the filled cache
-
sech1
let's see if anything breaks
-
hyc
cool
-
sech1
cache is just a memory mapped file that's flushed in the background. When p2pool loads it, it uses the same code path as external downloaded blocks, so it still does all checks
-
sech1
but it should work fine with corrupt cache
-
sech1
initial sync with cache will be limited by CPU - mostly how fast it can calculate PoW
-
sech1
10-20 seconds on modern CPU without this much logging
-
sech1
hmm, if cache gets corrupt it will ban whatever client it's syncing from when it gets to a corrupt p2pool block :D
-
hyc
prob should see that it's using the cache and skip that banning step ;)
-
sech1
yeah
-
moneromoooo
Why do blocks with bad PoW get into the cache in the first place ?
-
sech1
they don't
-
sech1
but cache is on the disc and it's used to skip network requests for blocks on startup
-
sech1
and it can get corrupt in uncountable ways :)
-
moneromoooo
Aleph zero corruptions on the disk, aleph zero corruptions... Take one and fix it...
-
sech1
actually block cache was supposed to have a keccak hash check before using the loaded block in the first place
-
sech1
But I forgot it :D
-
sech1
with this check random corruptions will be detected, but maliciously inserted blocks... On the other hand, it's a local file and if someone does it, system is already compromised
-
sech1
-
sech1
now it checks that's everything is good, from binary format to keccak hash
-
sech1
oh, that west-coast VPS is no longer running, 0 orphans again
-
sech1
btw we passed 100k blocks mark \o/
-
Inge
so the miner connects to his own local p2pool instance?
-
sech1
yes
-
sech1
it can run on a different machine, it's just a stratum server
-
Inge
and currently testnet for a while longer to iron out wrinkles?
-
sech1
yes
-
sech1
probably quite a while
-
sech1
we still need to test RandomX epoch change - how nodes handle it, how syncing for a new node works when epoch change is in PPLNS window etc
-
sech1
it's every 2048 Monero blocks
-
sech1
next epoch change on testnet is in ~2 days
-
sech1
I run p2pool debug build (with address sanitizer) on my dev pc and it hasn't crashed, so it's quite stable. I'm more worried about sidechain syncing bugs
-
sech1
soft forks, net splits and so on
-
sech1
moneromoooo did you stop your node? I see only 2 wallets (both mine) in PPLNS window
-
sech1
no, I see 159.48.53.164 is still connected, maybe only xmrig stopped there
-
moneromoooo
p2pool running, xmrig temporarily stopped.
-
sech1
ok
-
hyc
oh, mu p2pool died yesterday, didn't notice
-
hyc
and i didn't have cores enabled, dang
-
hyc
unlimited now\
-
sech1
died on which box?
-
sech1
I haven't got any crashes so far.
-
sech1
it's probably OOM killer again
-
moneromoooo
dmesg usually records OOM kills.
-
sech1
My node just banned 142.202.205.118 (super-lagging west-coast node). It broadcasted a block that lagged so far it couldn't be verified by my node
-
sech1
it was also mining at Monero height 1795092 while my node was already at 1795123, 1 hour behind
-
sech1
This is why you don't run heavy stuff on a cheap VPS with HDD
-
sech1
actually both my nodes banned it at the same time and for the same reason
-
hyc
yeah, dmesg shows it was an oom kill
-
hyc
too bad it doesn't work on my mac
-
hyc
trying it as a debug build now
-
hyc
sech1: this is pretty consistent on mac
paste.debian.net/1208872
-
sech1
I'll try to figure out why
-
sech1
hyc your node keeps asking for block id 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270
-
sech1
can you grep your logs for it?
-
sech1
because this block id doesn't exist on my node
-
hyc
hm, that's not in the log on my mac
-
sech1
it comes from 84.203.25.127
-
hyc
ah yes it's all over my rockpro64 log
-
sech1
my node mined it while it was syncing, so it has sidechain height 10 or something
-
sech1
then I restarted it so now my node doesn't know this block
-
hyc
NOTICE 2021-08-24 16:20:02.6860 P2PServer sending BLOCK_REQUEST for id = 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270
-
hyc
NOTICE 2021-08-24 16:20:02.7395 P2PServer block 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270 was received before, skipping it
-
sech1
the thing is, this block was broadcasted to your node so it should have it
-
hyc
mostly it's BLOCK_REQUESTS
-
sech1
damn I think now every node wants this block :D
-
hyc
well, this IP address is my router, the actual node could have been either the mac or the rockpro64 depending on which was runinng at the time
-
sech1
and it whatever block references it doesn't get pruned for some reason
-
sech1
just restart both your nodes and let's see if it forgets about this block
-
hyc
ok
-
sech1
I need to tweak pruning logic to deal with these border cases
-
hyc
hm the rockpro64 was already restarted due to oom
-
moneromoooo
I have 138 messages in 20 minutes of this: P2PServer got a request for block with id 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270 but couldn't find it
-
moneromoooo
Otherwise seems to work fine.
-
sech1
it's harmless
-
sech1
but it creates some spam on the network, so it's better to fix it
-
hyc
well I'm keeping the rockpro64 off now since it doesn't have enough RAM
-
hyc
so this is where the mac aborts
-
hyc
106 // b.m_lock should already be locked here, but try to lock it for reading anyway
-
hyc
-> 107 const int lock_result = uv_rwlock_tryrdlock(&b.m_lock);
-
hyc
perhaps tryrdlock doesn't like it if you already have a readlock. sounds bizarre tho
-
moneromoooo
Default mutex on some platforms is recursive, on some it's not. Might be related.
-
sech1
it's not a default mutex, it's read-write lock
-
sech1
but actually this is what might be happening there
-
sech1
hyc can you try to just remove that lock? I think it's not needed and the comment above is correct
-
sech1
it's already locked at this point
-
sech1
yes, I just double checked. It's called only from one place and it's locked there
-
hyc
ok I'll try that
-
hyc
definitely got past that now
-
hyc
should I PR this too?
-
hyc
ah I see you already did, cool
-
sech1
hyc are you trying to change p2pool config? If your consensus id changes, other nodes will disconnect you
-
sech1
I see it in my node log that you disconnect at handshake step
-
hyc
I haven't changed anything
-
hyc
but I suppose the mac would have gotten a different node ID than the rockpro64
-
hyc
how would I set that?
-
hyc
status says I have 4 p2p connections
-
sech1
Hmm, I restarted my node and it connected
-
sech1
NOTICE 2021-08-24 18:37:22.7127 P2PServer peer 84.203.25.127:37890 sent HANDSHAKE_CHALLENGE
-
sech1
followed by an immediate disconnect
-
sech1
can you search the log for my IP around that time? This is CEST time zone
-
hyc
logs should always use UTC ...
-
sech1
just search for any warnings with IP 31.208.56.53
-
hyc
I think I found it
-
hyc
NOTICE 2021-08-24 17:37:00.6790 P2PServer new connection from 31.208.56.53:62106
-
hyc
NOTICE 2021-08-24 17:37:00.6790 P2PServer sending HANDSHAKE_CHALLENGE
-
hyc
NOTICE 2021-08-24 17:37:00.6790 P2PServer peer 31.208.56.53:62106 sent HANDSHAKE_CHALLENGE
-
hyc
WARNING 2021-08-24 17:37:00.6790 P2PServer tried to connect to the same peer twice: current connection 31.208.56.53:62099, new connection 31.208.56.53:62106
-
hyc
NOTICE 2021-08-24 17:37:00.6790 P2PServer peer 31.208.56.53:62106 disconnected
-
sech1
aha
-
sech1
not a problem then, my node just thought it was a different peer on the same IP
-
sech1
it would've disconnected if it got HANDSHAKE_CHALLENGE from your node, but your node found it first and disconnected
-
hyc
so no issue here then
-
sech1
"gmtime may not be thread-safe". If you have better idea how to get UTC time in a thread-safe manner, you can do a PR (log.cpp, Stream::writeCurrentTime())
-
moneromoooo
There is a gmtime_r IIRC.
-
moneromoooo
Might be a GNU extension though.
-
sech1
not a part of C++ standard
-
sech1
not a big problem though, I already have #ifdef there
-
moneromoooo
Then I guess gettimeofday ?
-
sech1
I'm 99% sure we can just replace localtime_* with gmtime_* in the code and that's it
-
sech1
yeap
-
hyc
yeah they have identical API/semantics
-
sech1
my node is trying to connect to yours twice again for some reason
-
hyc
did it get saved twice in p2p_peers?
-
sech1
I check the list of connected IPs first before making new connection
-
sech1
so it shouldn't happen
-
sech1
hyc I think this is what's happening: my node gets ECONNRESET on connection to your node and then tries to reconnect, but your side of old connection is still connected
-
sech1
WARNING 2021-08-24 17:08:56.1699 P2PServer client: failed to read response, err = ECONNRESET
-
sech1
NOTICE 2021-08-24 17:08:56.1699 P2PServer peer 84.203.25.127:37890 disconnected
-
sech1
and then this loop of handshake-disconnect starts
-
sech1
this is UTC time now
-
sech1
hmm, I probably need to drop connections which weren't alive for some time (no meaningful packets sent)
-
sech1
hyc there's definitely something with connection to your node. My node sometimes connects to yours, but then it doesn't receive updates from it for some time, and then receives a bunch of broadcasts and ECONNRESET shortly after
-
sech1
don't know what changed, but it's fine now
-
sech1
nope, ECONNRESET again
-
aypro
is this normal "2021-08-24 17:44:41.716 W There were 73 blocks in the last 90 minutes, there might be large hash rate changes, or we might be partitioned, cut off from the Monero network or under attack, or your computer's time is off. Or it could be just sheer bad luck." ?
-
selsta
on testnet?
-
sech1
yes, I get the same
-
sech1
it's "large hash rate changes"
-
aypro
yeah testnet
-
sech1
because testnet was at 2 kh/s and now more people are mining it
-
aypro
got it
-
aypro
do you need some hashrate for testing ? or just running p2pool and node
-
sech1
p2pool, node and xmrig mining with 1 CPU thread is enough
-
sech1
hashrate will be needed for mainnet testing
-
sech1
not now
-
hyc
well, this is the M1 mac, which also had connection leak issues in monerod
-
hyc
so it's probably just a flaky build
-
sech1
it doesn't change the fact that other nodes need to handle flaky connections properly
-
hyc
I wonder if I can find newer or older gcc for it
-
CzarekNakamoto[m
i can donate 1khs of mining power to some p2pool if you provide me ip, but can't run p2pool at the moment
-
hyc
1kh is prob excessive
-
hyc
for testnet
-
sech1
p2pool on testnet is already like 75% of the whole network
-
sech1
good that it can't do 51% because it's decentralized :)
-
hyc
lol
-
sech1
I see 6 unique wallets, ATH \o/
-
aypro
nice
-
sech1
-
aypro
i see this from time to time, don't know if it's a problem "P2PServer peer 148.251.81.38:37890 is ahead on mainchain (height 1795263, your height 1795262). Is your monerod stuck or lagging?"
-
aypro
my internet should be good
-
sech1
if it's ahead by 1 block it's not a problem
-
sech1
different nodes receive new Monero block at different times
-
sech1
it can also happen if some peer mines a Monero block and sends p2pool block to you. Then you will see +1 height, because p2pool block almost always gets to you faster than Monero block
-
sech1
I should probably only print this in log if it's height+2 or more
-
hyc
how long does it take for peers to forget my node's consensus ID? should I just turn it off for a while?
-
sech1
this ID is derived from p2pool config, it's the same for everyone
-
sech1
your node IP is forgotten after 10 failed connection attempts IIRC
-
hyc
ok
-
hyc
seems pointless to have so many connection attempts
paste.debian.net/1208914
-
sech1
the reasoning is maybe that node is just restarting, so 10 attempts
-
hyc
-
hyc
so many redundant connections
-
sech1
from my side it looked like broken connections, I have a lot of ECONNRESET warnings
-
sech1
-
sech1
so it saw only 1 disconnect
-
hyc
interesting. perhaps it should log the local addr:port too
-
sech1
my local addr:port wouldn't be what you see because I'm behind not 1 but 2 NATs :D
-
hyc
ah...
-
sech1
even 1 NAT would change it
-
hyc
and yet you were able to forward ports in?
-
sech1
no
-
sech1
nothing works here for incoming connections
-
hyc
ah. sounds like a common home network
-
hyc
interesting that you have a pending connection attempt. should have gotten ECONNREFUSED right away
-
hyc
gonna assume 10 attempts have passed. starting up again
-
sech1
yeap, your node is not in p2pool_peers.txt
-
hyc
cool. I see no redundant conns in lsof now
-
sech1
I see no new warnings in log so far
-
sech1
4.5 kh/s on sidechain, more than network hashrate
-
sech1
somebody ramped up
-
sech1
hmm
-
sech1
maybe I shouldn't make sidechain diff higher than network diff :D
-
sech1
hmm... but it's not higher because 1 second block time.
-
sech1
so not a problem
-
hyc
my monerod has 56 inbound conns. never knew so many nodes were on testnet
-
sech1
10(out)+61(in) connections here
-
sech1
not here, but on my server
-
hyc
btw, my p2pool currently shows 579MB used
-
hyc
monerod only 327M
-
sech1
is this the new build with block cache?
-
sech1
552452 KB on my server
-
sech1
so the same ballpark
-
hyc
looks like yes, has block cache
-
hyc
327d1455fe4f861df82624d4c5a4d6eb4b15a7a7
-
hyc
it's not the latest, but was pulled today
-
sech1
yes, it's with the cache
-
sech1
cache is 480 MB, but I think not all of it is in physical RAM because of how memory mapped files work
-
hyc
likely
-
hyc
well the mac has 32GB RAM and not much happening on it so it could all be resident
-
sech1
5120 blocks * 4 KB (blocks are small now) = 20 MB most likely
-
sech1
Mac M1 has 4 KB pages, right? Or bigger?
-
hyc
16KB
-
sech1
so 80 MB
-
sech1
cache uses 96 KB per block and most of it is never touched now, so it's probably not in RAM
-
hyc
no big deal either way
-
sech1
moneromoooo is it safe to submit block template with enough PoW on a different Monero node? I see that p2pool blocks arrive before Monero blocks (mined by p2pool) quite often
-
sech1
so p2pool could broadcast mined Monero blocks more efficiently
-
sech1
for example block
testnet.xmrchain.net/block/1795518 - my node received p2pool broadcast of this block 0.1 seconds before it got ZMQ notification from monerod
-
sech1
or maybe it's monerod that took 0.1 seconds to add the block and then send notification
-
hyc
could be
-
hyc
check timestamps in monerod log for block receipt
-
sech1
I'm not sure it's safe to submit the same block concurrently
-
hyc
if it's not safe that'd be a pretty silly vulnerability
-
sech1
which log level in monerod is needed for block receipts?
-
hyc
eh, level 1 ought to be enough. don't recall
-
sech1
I'll wait for next p2pool block then with level 1
-
sech1
-
sech1
p2pool is faster
-
sech1
-
sech1
p2pool block id 708f76dc5ffa092798664b4cc5821f1b410607de729d7226d659da064b3ba993 which can be found in tx extra for that block, so it's 100% that block
-
sech1
I'll add submitting of broadcasted blocks tomorrow and we'll see if anything breaks
-
sech1
So p2pool will be able to speed up block propagation in Monero network, wow
-
hyc
cool
-
sech1
but only for blocks it mines
-
moneromoooo
I do not understand that question.
-
sech1
if monerod receives fluffy block and at the same time the same block via RPC, is it safe?
-
sech1
i.e. no crashes
-
moneromoooo
Should be safe.
-
sech1
I assume it should be protected by a lock somewhere
-
sech1
ok
-
moneromoooo
It is.
-
hyc
so a p2pool miner found a block, and broadcasted it to all other p2pool peers
-
hyc
it would also have submitted it to its local monerod
-
hyc
so its local monerod broadcasted the block more slowly than p2pool ?
-
sech1
yes
-
sech1
first, because there were more hops over the Internet
-
sech1
second, because monerod checks PoW in light mode (+10 ms on each hop)
-
sech1
and also does a bunch of other transaction checks before broadcasting it further
-
sech1
you can see in my paste that the block came from a different IP address
-
hyc
sounds ok. so all p2pool miner's monerods can be updated a little bit ahead of rest of network
-
sech1
IIRC supportxmr has this fast broadcast system between its nodes
-
sech1
to reduce orphan rate
-
hyc
-
hyc
GPU-friendly PoW will crumple
-
hyc
p2pool 581M now. so, some slow growth happening. will check again tomorrow