06:28:10 <sech1> whoever is running p2pool at 142.202.205.118, check your monerod, it's probably stuck
06:30:07 <sech1> memory leak check: yesterday it was 3469620/37780/10096 (VIRT/RES/SHR), today 3470972/52032/10096
06:30:51 <sech1> hyc I had the same problem with zmq_reader - it doesn't wake up until it gets a message, so I send a fake message on shutdown
06:57:57 <sech1> Node running at 142.202.205.118 is constantly 1 Monero block behind for whatever reason, so other nodes reject its p2pool blocks. We have a lot of orphans from that node because of that. But it's all orphans from that node
07:02:24 <sech1> not constantly, but it's syncing with Monero chain so slowly, literally 1-2 minutes later than other nodes
07:24:39 <Inge> would e.g. it beign an rPi with software AES possibly explain such lag?
07:35:25 <sech1> might be
07:35:41 <sech1> my node has 0 orphans, so it's not my problem :D
07:36:13 <sech1> rPi with slow HDD probably :D
07:37:08 <sech1> I need to think about banning such nodes still. My node sends full unpruned blocks to it every time because it's so far behind. This is wasteful for the bandwidth.
07:55:30 <sech1> that lagging node is not an rPi, it's crowncloud VPS
07:56:22 <sech1> probably something like 2 vCPUs and 1 GB RAM
07:57:04 <sech1> and they do have HDD servers there, not only SSD
08:01:12 <sech1> 31 orphans, and it's only those that came from non-lagging Monero blocks: https://paste.debian.net/hidden/11ddf392/
08:01:19 <sech1> I bet that node has thousands of orphans
08:01:49 <sech1> Even my shares have 3% uncles, it was 1% yesterday. Lagging nodes are bad
08:02:57 <sech1> But it's like the worst possible condition because p2pool runs at 1 second block time and that node is literally on the other side of the globe, relative to mine
08:04:04 <sech1> Uncle blocks are worth 20% less with default settings
08:10:14 <CzarekNakamoto[m> <sech1> "I need to think about banning..." <- Banning like just for a hour or something only for you or globally? Also does p2pool require full blockchain to be downloaded or it can be pruned blockchain/public node? Also isn't 1 second too small? If somebody have ping larger than 800ms to another note it will create an insane lag (and if I understand that correctly, it can cause a softwork)
08:10:16 <CzarekNakamoto[m> > <@sech1:libera.chat> I need to think about banning such nodes still. My node sends full unpruned blocks to it every time because it's so far behind. This is wasteful for the bandwidth.
08:10:16 <CzarekNakamoto[m>  * Banning like just for a hour or something only for you or globally? Also does p2pool require full blockchain to be downloaded or it can be pruned blockchain/public node? Also isn't 1 second too small? If somebody have ping larger than 800ms to another note it will create an insane lag (and if I understand that correctly, it can cause a softfork)
08:15:18 <sech1> Please don't edit messages, it spam IRC side
08:15:36 <sech1> Banning locally. We run with 1 second block time to run a stress-test
08:15:50 <sech1> Pruned Monero node is ok
08:17:45 <sech1> High ping by itself shouldn't be a problem, uncle blocks fix this in 99% cases. But that node is probably a low-tier VPS running on an HDD. Monero is already heavy on HDD, and if some neighbour on that VPS does something heavy too, it will lag - and it does
08:20:17 <sech1> Other p2pool nodes don't tolerate when someone sends p2pool block build on top of an old Monero block because it useless for Monero side. So they just ignore such blocks and the node that sent them will get orphans
08:20:54 <sech1> But they don't ban such node (yet). I need to think of when to ban these nodes without making too many false positives
09:16:29 <CzarekNakamoto[m> Okay, I think that I get it now, thanks sech1!
10:36:26 <hyc> sech1: uv_async_send() is the fake message to wake up the event loop
11:00:29 <sech1> hyc I also submitted my block cache stuff and restarted p2pool node at 148.251.81.38 using the filled cache
11:00:33 <sech1> let's see if anything breaks
11:01:43 <hyc> cool
11:06:43 <sech1> cache is just a memory mapped file that's flushed in the background. When p2pool loads it, it uses the same code path as external downloaded blocks, so it still does all checks
11:07:08 <sech1> but it should work fine with corrupt cache
11:08:06 <sech1> initial sync with cache will be limited by CPU - mostly how fast it can calculate PoW
11:08:37 <sech1> 10-20 seconds on modern CPU without this much logging
11:11:51 <sech1> hmm, if cache gets corrupt it will ban whatever client it's syncing from when it gets to a corrupt p2pool block :D
11:16:28 <hyc> prob should see that it's using the cache and skip that banning step ;)
11:18:05 <sech1> yeah
11:28:51 <moneromoooo> Why do blocks with bad PoW get into the cache in the first place ?
11:29:27 <sech1> they don't
11:29:43 <sech1> but cache is on the disc and it's used to skip network requests for blocks on startup
11:29:55 <sech1> and it can get corrupt in uncountable ways :)
11:30:52 <moneromoooo> Aleph zero corruptions on the disk, aleph zero corruptions... Take one and fix it...
11:31:33 <sech1> actually block cache was supposed to have a keccak hash check before using the loaded block in the first place
11:31:37 <sech1> But I forgot it :D
11:32:15 <sech1> with this check random corruptions will be detected, but maliciously inserted blocks... On the other hand, it's a local file and if someone does it, system is already compromised
11:34:19 <sech1> lol https://github.com/SChernykh/p2pool/commit/4837b4c863638b9decccb9fb123400a2cf3dd015
11:34:32 <sech1> now it checks that's everything is good, from binary format to keccak hash
11:39:31 <sech1> oh, that west-coast VPS is no longer running, 0 orphans again
11:39:52 <sech1> btw we passed 100k blocks mark \o/
11:42:17 <Inge> so the miner connects to his own local p2pool instance?
11:43:32 <sech1> yes
11:43:49 <sech1> it can run on a different machine, it's just a stratum server
11:51:23 <Inge> and currently testnet for a while longer to iron out wrinkles?
11:53:11 <sech1> yes
11:53:22 <sech1> probably quite a while
11:53:48 <sech1> we still need to test RandomX epoch change - how nodes handle it, how syncing for a new node works when epoch change is in PPLNS window etc
11:53:55 <sech1> it's every 2048 Monero blocks
11:54:41 <sech1> next epoch change on testnet is in ~2 days
11:57:59 <sech1> I run p2pool debug build (with address sanitizer) on my dev pc and it hasn't crashed, so it's quite stable. I'm more worried about sidechain syncing bugs
11:58:29 <sech1> soft forks, net splits and so on
12:03:06 <sech1> moneromoooo did you stop your node? I see only 2 wallets (both mine) in PPLNS window
12:05:22 <sech1> no, I see 159.48.53.164 is still connected, maybe only xmrig stopped there
12:06:39 <moneromoooo> p2pool running, xmrig temporarily stopped.
12:06:42 <sech1> ok
14:15:36 <hyc> oh, mu p2pool died yesterday, didn't notice
14:16:53 <hyc> and i didn't have cores enabled, dang
14:16:59 <hyc> unlimited now\
14:41:36 <sech1> died on which box?
14:42:00 <sech1> I haven't got any crashes so far.
14:42:14 <sech1> it's probably OOM killer again
14:51:33 <moneromoooo> dmesg usually records OOM kills.
14:54:21 <sech1> My node just banned 142.202.205.118 (super-lagging west-coast node). It broadcasted a block that lagged so far it couldn't be verified by my node
14:58:24 <sech1> it was also mining at Monero height 1795092 while my node was already at 1795123, 1 hour behind
14:58:36 <sech1> This is why you don't run heavy stuff on a cheap VPS with HDD
14:59:46 <sech1> actually both my nodes banned it at the same time and for the same reason
15:08:56 <hyc> yeah, dmesg shows it was an oom kill
15:09:32 <hyc> too bad it doesn't work on my mac
15:12:30 <hyc> trying it as a debug build now
15:30:15 <hyc> sech1: this is pretty consistent on mac https://paste.debian.net/1208872/
15:31:16 <sech1> I'll try to figure out why
15:31:25 <sech1> hyc your node keeps asking for block id 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270
15:31:30 <sech1> can you grep your logs for it?
15:31:44 <sech1> because this block id doesn't exist on my node
15:32:22 <hyc> hm, that's not in the log on my mac
15:32:52 <sech1> it comes from 84.203.25.127
15:33:00 <hyc> ah yes it's all over my rockpro64 log
15:33:18 <sech1> my node mined it while it was syncing, so it has sidechain height 10 or something
15:33:26 <sech1> then I restarted it so now my node doesn't know this block
15:33:41 <hyc> NOTICE  2021-08-24 16:20:02.6860 P2PServer sending BLOCK_REQUEST for id = 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270
15:33:41 <hyc> NOTICE  2021-08-24 16:20:02.7395 P2PServer block 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270 was received before, skipping it
15:33:49 <sech1> the thing is, this block was broadcasted to your node so it should have it
15:33:57 <hyc> mostly it's BLOCK_REQUESTS
15:34:25 <sech1> damn I think now every node wants this block :D
15:34:37 <hyc> well, this IP address is my router, the actual node could have been either the mac or the rockpro64 depending on which was runinng at the time
15:34:44 <sech1> and it whatever block references it doesn't get pruned for some reason
15:35:40 <sech1> just restart both your nodes and let's see if it forgets about this block
15:36:04 <hyc> ok
15:36:31 <sech1> I need to tweak pruning logic to deal with these border cases
15:36:33 <hyc> hm the rockpro64 was already restarted due to oom
15:36:39 <moneromoooo> I have 138 messages in 20 minutes of this: P2PServer got a request for block with id 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270 but couldn't find it
15:36:49 <moneromoooo> Otherwise seems to work fine.
15:36:56 <sech1> it's harmless
15:37:07 <sech1> but it creates some spam on the network, so it's better to fix it
15:37:57 <hyc> well I'm keeping the rockpro64 off now since it doesn't have enough RAM
15:41:48 <hyc> so this is where the mac aborts 
15:41:49 <hyc>   106          // b.m_lock should already be locked here, but try to lock it for reading anyway
15:41:49 <hyc> -> 107          const int lock_result = uv_rwlock_tryrdlock(&b.m_lock);
15:42:10 <hyc> perhaps tryrdlock doesn't like it if you already have a readlock. sounds bizarre tho
15:42:50 <moneromoooo> Default mutex on some platforms is recursive, on some it's not. Might be related.
15:47:46 <sech1> it's not a default mutex, it's read-write lock
15:51:34 <sech1> but actually this is what might be happening there
15:52:49 <sech1> hyc can you try to just remove that lock? I think it's not needed and the comment above is correct
15:52:54 <sech1> it's already locked at this point
15:56:02 <sech1> yes, I just double checked. It's called only from one place and it's locked there
15:56:51 <hyc> ok I'll try that
16:04:24 <hyc> definitely got past that now
16:06:35 <hyc> should I PR this too?
16:09:14 <hyc> ah I see you already did, cool
16:36:13 <sech1> hyc are you trying to change p2pool config? If your consensus id changes, other nodes will disconnect you
16:36:27 <sech1> I see it in my node log that you disconnect at handshake step
16:37:15 <hyc> I haven't changed anything
16:37:41 <hyc> but I suppose the mac would have gotten a different node ID than the rockpro64
16:38:02 <hyc> how would I set that?
16:39:09 <hyc> status says I have 4 p2p connections
16:39:43 <sech1> Hmm, I restarted my node and it connected
16:40:30 <sech1> NOTICE  2021-08-24 18:37:22.7127 P2PServer peer 84.203.25.127:37890 sent HANDSHAKE_CHALLENGE
16:40:36 <sech1> followed by an immediate disconnect
16:40:50 <sech1> can you search the log for my IP around that time? This is CEST time zone
16:41:15 <hyc> logs should always use UTC ...
16:42:37 <sech1> just search for any warnings with IP 31.208.56.53
16:43:31 <hyc> I think I found it
16:43:47 <hyc> NOTICE  2021-08-24 17:37:00.6790 P2PServer new connection from 31.208.56.53:62106
16:43:47 <hyc> NOTICE  2021-08-24 17:37:00.6790 P2PServer sending HANDSHAKE_CHALLENGE
16:43:47 <hyc> NOTICE  2021-08-24 17:37:00.6790 P2PServer peer 31.208.56.53:62106 sent HANDSHAKE_CHALLENGE
16:43:47 <hyc> WARNING 2021-08-24 17:37:00.6790 P2PServer tried to connect to the same peer twice: current connection 31.208.56.53:62099, new connection 31.208.56.53:62106
16:43:47 <hyc> NOTICE  2021-08-24 17:37:00.6790 P2PServer peer 31.208.56.53:62106 disconnected
16:43:59 <sech1> aha
16:44:27 <sech1> not a problem then, my node just thought it was a different peer on the same IP
16:44:53 <sech1> it would've disconnected if it got HANDSHAKE_CHALLENGE from your node, but your node found it first and disconnected
16:47:42 <hyc> so no issue here then
16:48:22 <sech1> "gmtime may not be thread-safe". If you have better idea how to get UTC time in a thread-safe manner, you can do a PR (log.cpp, Stream::writeCurrentTime())
16:49:56 <moneromoooo> There is a gmtime_r IIRC.
16:50:10 <moneromoooo> Might be a GNU extension though.
16:50:53 <sech1> not a part of C++ standard
16:52:13 <sech1> not a big problem though, I already have #ifdef there
16:52:31 <moneromoooo> Then I guess gettimeofday ?
16:53:28 <sech1> I'm 99% sure we can just replace localtime_* with gmtime_* in the code and that's it
16:55:55 <sech1> yeap
16:57:02 <hyc> yeah they have identical API/semantics
17:02:17 <sech1> my node is trying to connect to yours twice again for some reason
17:04:02 <hyc> did it get saved twice in p2p_peers?
17:04:18 <sech1> I check the list of connected IPs first before making new connection
17:04:26 <sech1> so it shouldn't happen
17:11:15 <sech1> hyc I think this is what's happening: my node gets ECONNRESET on connection to your node and then tries to reconnect, but your side of old connection is still connected
17:11:48 <sech1> WARNING 2021-08-24 17:08:56.1699 P2PServer client: failed to read response, err = ECONNRESET
17:11:48 <sech1> NOTICE  2021-08-24 17:08:56.1699 P2PServer peer 84.203.25.127:37890 disconnected
17:12:01 <sech1> and then this loop of handshake-disconnect starts
17:12:07 <sech1> this is UTC time now
17:12:53 <sech1> hmm, I probably need to drop connections which weren't alive for some time (no meaningful packets sent)
17:21:57 <sech1> hyc there's definitely something with connection to your node. My node sometimes connects to yours, but then it doesn't receive updates from it for some time, and then receives a bunch of broadcasts and ECONNRESET shortly after
17:28:55 <sech1> don't know what changed, but it's fine now
17:29:16 <sech1> nope, ECONNRESET again
17:45:55 <aypro> is this normal "2021-08-24 17:44:41.716 W There were 73 blocks in the last 90 minutes, there might be large hash rate changes, or we might be partitioned, cut off from the Monero network or under attack, or your computer's time is off. Or it could be just sheer bad luck." ?
17:46:28 <selsta> on testnet?
17:46:40 <sech1> yes, I get the same
17:46:43 <sech1> it's "large hash rate changes"
17:46:45 <aypro> yeah testnet
17:46:55 <sech1> because testnet was at 2 kh/s and now more people are mining it
17:47:09 <aypro> got it
17:49:40 <aypro> do you need some hashrate for testing ? or just running p2pool and node
17:53:26 <sech1> p2pool, node and xmrig mining with 1 CPU thread is enough
17:53:36 <sech1> hashrate will be needed for mainnet testing
17:53:38 <sech1> not now
17:58:39 <hyc> well, this is the M1 mac, which also had connection leak issues in monerod
17:58:52 <hyc> so it's probably just a flaky build
17:59:06 <sech1> it doesn't change the fact that other nodes need to handle flaky connections properly
17:59:10 <hyc> I wonder if I can find newer or older gcc for it
18:00:40 <CzarekNakamoto[m> i can donate 1khs of mining power to some p2pool if you provide me ip, but can't run p2pool at the moment
18:01:41 <hyc> 1kh is prob excessive
18:01:51 <hyc> for testnet
18:02:14 <sech1> p2pool on testnet is already like 75% of the whole network
18:02:27 <sech1> good that it can't do 51% because it's decentralized :)
18:02:38 <hyc> lol
18:03:14 <sech1> I see 6 unique wallets, ATH \o/
18:04:08 <aypro> nice
18:05:33 <sech1> 6 payouts: https://testnet.xmrchain.net/tx/d45d5bbaa73d1301a1813cfeb15089604093a3fd67e142d4a215964617edf3df
18:05:36 <aypro> i see this from time to time, don't know if it's a problem "P2PServer peer 148.251.81.38:37890 is ahead on mainchain (height 1795263, your height 1795262). Is your monerod stuck or lagging?"
18:06:02 <aypro> my internet should be good
18:06:04 <sech1> if it's ahead by 1 block it's not a problem
18:06:15 <sech1> different nodes receive new Monero block at different times
18:07:18 <sech1> it can also happen if some peer mines a Monero block and sends p2pool block to you. Then you will see +1 height, because p2pool block almost always gets to you faster than Monero block
18:07:43 <sech1> I should probably only print this in log if it's height+2 or more
21:04:39 <hyc> how long does it take for peers to forget my node's consensus ID? should I just turn it off for a while?
21:05:40 <sech1> this ID is derived from p2pool config, it's the same for everyone
21:06:23 <sech1> your node IP is forgotten after 10 failed connection attempts IIRC
21:06:40 <hyc> ok
21:07:40 <hyc> seems pointless to have so many connection attempts https://paste.debian.net/1208914/
21:09:05 <sech1> the reasoning is maybe that node is just restarting, so 10 attempts
21:09:23 <hyc> https://paste.debian.net/1208915/
21:09:38 <hyc> so many redundant connections
21:10:51 <sech1> from my side it looked like broken connections, I have a lot of ECONNRESET warnings
21:12:50 <sech1> from my side: https://paste.debian.net/hidden/90507fc5/
21:13:12 <sech1> so it saw only 1 disconnect
21:13:42 <hyc> interesting. perhaps it should log the local addr:port too
21:14:59 <sech1> my local addr:port wouldn't be what you see because I'm behind not 1 but 2 NATs :D
21:15:07 <hyc> ah...
21:15:18 <sech1> even 1 NAT would change it
21:15:23 <hyc> and yet you were able to forward ports in?
21:15:29 <sech1> no
21:15:43 <sech1> nothing works here for incoming connections
21:16:05 <hyc> ah. sounds like a common home network
21:18:25 <hyc> interesting that you have a pending connection attempt. should have gotten ECONNREFUSED right away
21:33:45 <hyc> gonna assume 10 attempts have passed. starting up again
21:34:12 <sech1> yeap, your node is not in p2pool_peers.txt
21:34:36 <hyc> cool. I see no redundant conns in lsof now
21:37:26 <sech1> I see no new warnings in log so far
21:38:13 <sech1> 4.5 kh/s on sidechain, more than network hashrate
21:38:16 <sech1> somebody ramped up
21:39:36 <sech1> hmm
21:39:48 <sech1> maybe I shouldn't make sidechain diff higher than network diff :D
21:40:13 <sech1> hmm... but it's not higher because 1 second block time.
21:40:18 <sech1> so not a problem
21:41:47 <hyc> my monerod has 56 inbound conns. never knew so many nodes were on testnet
21:43:27 <sech1> 10(out)+61(in) connections here
21:43:38 <sech1> not here, but on my server
22:46:11 <hyc> btw, my p2pool currently shows 579MB used
22:46:31 <hyc> monerod only 327M
22:47:31 <sech1> is this the new build with block cache?
22:48:15 <sech1> 552452 KB on my server
22:48:29 <sech1> so the same ballpark
22:48:57 <hyc> looks like yes, has block cache
22:49:13 <hyc> 327d1455fe4f861df82624d4c5a4d6eb4b15a7a7 
22:49:21 <hyc> it's not the latest, but was pulled today
22:49:45 <sech1> yes, it's with the cache
22:50:15 <sech1> cache is 480 MB, but I think not all of it is in physical RAM because of how memory mapped files work
22:50:25 <hyc> likely
22:50:44 <hyc> well the mac has 32GB RAM and not much happening on it so it could all be resident
22:50:54 <sech1> 5120 blocks * 4 KB (blocks are small now) = 20 MB most likely
22:51:23 <sech1> Mac M1 has 4 KB pages, right? Or bigger?
22:51:34 <hyc> 16KB
22:51:39 <sech1> so 80 MB
22:52:12 <sech1> cache uses 96 KB per block and most of it is never touched now, so it's probably not in RAM
22:52:59 <hyc> no big deal either way
22:53:51 <sech1> moneromoooo is it safe to submit block template with enough PoW on a different Monero node? I see that p2pool blocks arrive before Monero blocks (mined by p2pool) quite often
22:54:14 <sech1> so p2pool could broadcast mined Monero blocks more efficiently
22:58:38 <sech1> for example block https://testnet.xmrchain.net/block/1795518 - my node received p2pool broadcast of this block 0.1 seconds before it got ZMQ notification from monerod
22:59:33 <sech1> or maybe it's monerod that took 0.1 seconds to add the block and then send notification
22:59:40 <hyc> could be
22:59:48 <hyc> check timestamps in monerod log for block receipt
22:59:51 <sech1> I'm not sure it's safe to submit the same block concurrently
23:00:16 <hyc> if it's not safe that'd be a pretty silly vulnerability
23:01:03 <sech1> which log level in monerod is needed for block receipts?
23:01:19 <hyc> eh, level 1 ought to be enough. don't recall
23:01:35 <sech1> I'll wait for next p2pool block then with level 1
23:06:31 <sech1> confirmed 0.1 second delay: https://paste.debian.net/hidden/09d05249/
23:06:34 <sech1> p2pool is faster
23:07:13 <sech1> this is https://testnet.xmrchain.net/block/1795523
23:07:40 <sech1> p2pool block id 708f76dc5ffa092798664b4cc5821f1b410607de729d7226d659da064b3ba993 which can be found in tx extra for that block, so it's 100% that block
23:09:02 <sech1> I'll add submitting of broadcasted blocks tomorrow and we'll see if anything breaks
23:09:42 <sech1> So p2pool will be able to speed up block propagation in Monero network, wow
23:09:54 <hyc> cool
23:10:04 <sech1> but only for blocks it mines
23:13:19 <moneromoooo> I do not understand that question.
23:14:48 <sech1> if monerod receives fluffy block and at the same time the same block via RPC, is it safe?
23:14:54 <sech1> i.e. no crashes
23:15:27 <moneromoooo> Should be safe.
23:15:31 <sech1> I assume it should be protected by a lock somewhere
23:15:49 <sech1> ok
23:17:13 <moneromoooo> It is.
23:17:41 <hyc> so a p2pool miner found a block, and broadcasted it to all other p2pool peers
23:17:49 <hyc> it would also have submitted it to its local monerod
23:18:05 <hyc> so its local monerod broadcasted the block more slowly than p2pool ?
23:18:12 <sech1> yes
23:18:37 <sech1> first, because there were more hops over the Internet
23:18:53 <sech1> second, because monerod checks PoW in light mode (+10 ms on each hop)
23:19:11 <sech1> and also does a bunch of other transaction checks before broadcasting it further
23:19:38 <sech1> you can see in my paste that the block came from a different IP address
23:20:06 <hyc> sounds ok. so all p2pool miner's monerods can be updated a little bit ahead of rest of network
23:20:46 <sech1> IIRC supportxmr has this fast broadcast system between its nodes
23:20:55 <sech1> to reduce orphan rate
23:28:25 <hyc> clusters of wafer-scale compute ... https://www.wired.com/story/cerebras-chip-cluster-neural-networks-ai/
23:28:44 <hyc> GPU-friendly PoW will crumple
23:46:43 <hyc> p2pool 581M now. so, some slow growth happening. will check again tomorrow