06:28:10 whoever is running p2pool at 142.202.205.118, check your monerod, it's probably stuck 06:30:07 memory leak check: yesterday it was 3469620/37780/10096 (VIRT/RES/SHR), today 3470972/52032/10096 06:30:51 hyc I had the same problem with zmq_reader - it doesn't wake up until it gets a message, so I send a fake message on shutdown 06:57:57 Node running at 142.202.205.118 is constantly 1 Monero block behind for whatever reason, so other nodes reject its p2pool blocks. We have a lot of orphans from that node because of that. But it's all orphans from that node 07:02:24 not constantly, but it's syncing with Monero chain so slowly, literally 1-2 minutes later than other nodes 07:24:39 would e.g. it beign an rPi with software AES possibly explain such lag? 07:35:25 might be 07:35:41 my node has 0 orphans, so it's not my problem :D 07:36:13 rPi with slow HDD probably :D 07:37:08 I need to think about banning such nodes still. My node sends full unpruned blocks to it every time because it's so far behind. This is wasteful for the bandwidth. 07:55:30 that lagging node is not an rPi, it's crowncloud VPS 07:56:22 probably something like 2 vCPUs and 1 GB RAM 07:57:04 and they do have HDD servers there, not only SSD 08:01:12 31 orphans, and it's only those that came from non-lagging Monero blocks: https://paste.debian.net/hidden/11ddf392/ 08:01:19 I bet that node has thousands of orphans 08:01:49 Even my shares have 3% uncles, it was 1% yesterday. Lagging nodes are bad 08:02:57 But it's like the worst possible condition because p2pool runs at 1 second block time and that node is literally on the other side of the globe, relative to mine 08:04:04 Uncle blocks are worth 20% less with default settings 08:10:14 "I need to think about banning..." <- Banning like just for a hour or something only for you or globally? Also does p2pool require full blockchain to be downloaded or it can be pruned blockchain/public node? Also isn't 1 second too small? If somebody have ping larger than 800ms to another note it will create an insane lag (and if I understand that correctly, it can cause a softwork) 08:10:16 > <@sech1:libera.chat> I need to think about banning such nodes still. My node sends full unpruned blocks to it every time because it's so far behind. This is wasteful for the bandwidth. 08:10:16 * Banning like just for a hour or something only for you or globally? Also does p2pool require full blockchain to be downloaded or it can be pruned blockchain/public node? Also isn't 1 second too small? If somebody have ping larger than 800ms to another note it will create an insane lag (and if I understand that correctly, it can cause a softfork) 08:15:18 Please don't edit messages, it spam IRC side 08:15:36 Banning locally. We run with 1 second block time to run a stress-test 08:15:50 Pruned Monero node is ok 08:17:45 High ping by itself shouldn't be a problem, uncle blocks fix this in 99% cases. But that node is probably a low-tier VPS running on an HDD. Monero is already heavy on HDD, and if some neighbour on that VPS does something heavy too, it will lag - and it does 08:20:17 Other p2pool nodes don't tolerate when someone sends p2pool block build on top of an old Monero block because it useless for Monero side. So they just ignore such blocks and the node that sent them will get orphans 08:20:54 But they don't ban such node (yet). I need to think of when to ban these nodes without making too many false positives 09:16:29 Okay, I think that I get it now, thanks sech1! 10:36:26 sech1: uv_async_send() is the fake message to wake up the event loop 11:00:29 hyc I also submitted my block cache stuff and restarted p2pool node at 148.251.81.38 using the filled cache 11:00:33 let's see if anything breaks 11:01:43 cool 11:06:43 cache is just a memory mapped file that's flushed in the background. When p2pool loads it, it uses the same code path as external downloaded blocks, so it still does all checks 11:07:08 but it should work fine with corrupt cache 11:08:06 initial sync with cache will be limited by CPU - mostly how fast it can calculate PoW 11:08:37 10-20 seconds on modern CPU without this much logging 11:11:51 hmm, if cache gets corrupt it will ban whatever client it's syncing from when it gets to a corrupt p2pool block :D 11:16:28 prob should see that it's using the cache and skip that banning step ;) 11:18:05 yeah 11:28:51 Why do blocks with bad PoW get into the cache in the first place ? 11:29:27 they don't 11:29:43 but cache is on the disc and it's used to skip network requests for blocks on startup 11:29:55 and it can get corrupt in uncountable ways :) 11:30:52 Aleph zero corruptions on the disk, aleph zero corruptions... Take one and fix it... 11:31:33 actually block cache was supposed to have a keccak hash check before using the loaded block in the first place 11:31:37 But I forgot it :D 11:32:15 with this check random corruptions will be detected, but maliciously inserted blocks... On the other hand, it's a local file and if someone does it, system is already compromised 11:34:19 lol https://github.com/SChernykh/p2pool/commit/4837b4c863638b9decccb9fb123400a2cf3dd015 11:34:32 now it checks that's everything is good, from binary format to keccak hash 11:39:31 oh, that west-coast VPS is no longer running, 0 orphans again 11:39:52 btw we passed 100k blocks mark \o/ 11:42:17 so the miner connects to his own local p2pool instance? 11:43:32 yes 11:43:49 it can run on a different machine, it's just a stratum server 11:51:23 and currently testnet for a while longer to iron out wrinkles? 11:53:11 yes 11:53:22 probably quite a while 11:53:48 we still need to test RandomX epoch change - how nodes handle it, how syncing for a new node works when epoch change is in PPLNS window etc 11:53:55 it's every 2048 Monero blocks 11:54:41 next epoch change on testnet is in ~2 days 11:57:59 I run p2pool debug build (with address sanitizer) on my dev pc and it hasn't crashed, so it's quite stable. I'm more worried about sidechain syncing bugs 11:58:29 soft forks, net splits and so on 12:03:06 moneromoooo did you stop your node? I see only 2 wallets (both mine) in PPLNS window 12:05:22 no, I see 159.48.53.164 is still connected, maybe only xmrig stopped there 12:06:39 p2pool running, xmrig temporarily stopped. 12:06:42 ok 14:15:36 oh, mu p2pool died yesterday, didn't notice 14:16:53 and i didn't have cores enabled, dang 14:16:59 unlimited now\ 14:41:36 died on which box? 14:42:00 I haven't got any crashes so far. 14:42:14 it's probably OOM killer again 14:51:33 dmesg usually records OOM kills. 14:54:21 My node just banned 142.202.205.118 (super-lagging west-coast node). It broadcasted a block that lagged so far it couldn't be verified by my node 14:58:24 it was also mining at Monero height 1795092 while my node was already at 1795123, 1 hour behind 14:58:36 This is why you don't run heavy stuff on a cheap VPS with HDD 14:59:46 actually both my nodes banned it at the same time and for the same reason 15:08:56 yeah, dmesg shows it was an oom kill 15:09:32 too bad it doesn't work on my mac 15:12:30 trying it as a debug build now 15:30:15 sech1: this is pretty consistent on mac https://paste.debian.net/1208872/ 15:31:16 I'll try to figure out why 15:31:25 hyc your node keeps asking for block id 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270 15:31:30 can you grep your logs for it? 15:31:44 because this block id doesn't exist on my node 15:32:22 hm, that's not in the log on my mac 15:32:52 it comes from 84.203.25.127 15:33:00 ah yes it's all over my rockpro64 log 15:33:18 my node mined it while it was syncing, so it has sidechain height 10 or something 15:33:26 then I restarted it so now my node doesn't know this block 15:33:41 NOTICE 2021-08-24 16:20:02.6860 P2PServer sending BLOCK_REQUEST for id = 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270 15:33:41 NOTICE 2021-08-24 16:20:02.7395 P2PServer block 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270 was received before, skipping it 15:33:49 the thing is, this block was broadcasted to your node so it should have it 15:33:57 mostly it's BLOCK_REQUESTS 15:34:25 damn I think now every node wants this block :D 15:34:37 well, this IP address is my router, the actual node could have been either the mac or the rockpro64 depending on which was runinng at the time 15:34:44 and it whatever block references it doesn't get pruned for some reason 15:35:40 just restart both your nodes and let's see if it forgets about this block 15:36:04 ok 15:36:31 I need to tweak pruning logic to deal with these border cases 15:36:33 hm the rockpro64 was already restarted due to oom 15:36:39 I have 138 messages in 20 minutes of this: P2PServer got a request for block with id 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270 but couldn't find it 15:36:49 Otherwise seems to work fine. 15:36:56 it's harmless 15:37:07 but it creates some spam on the network, so it's better to fix it 15:37:57 well I'm keeping the rockpro64 off now since it doesn't have enough RAM 15:41:48 so this is where the mac aborts 15:41:49 106 // b.m_lock should already be locked here, but try to lock it for reading anyway 15:41:49 -> 107 const int lock_result = uv_rwlock_tryrdlock(&b.m_lock); 15:42:10 perhaps tryrdlock doesn't like it if you already have a readlock. sounds bizarre tho 15:42:50 Default mutex on some platforms is recursive, on some it's not. Might be related. 15:47:46 it's not a default mutex, it's read-write lock 15:51:34 but actually this is what might be happening there 15:52:49 hyc can you try to just remove that lock? I think it's not needed and the comment above is correct 15:52:54 it's already locked at this point 15:56:02 yes, I just double checked. It's called only from one place and it's locked there 15:56:51 ok I'll try that 16:04:24 definitely got past that now 16:06:35 should I PR this too? 16:09:14 ah I see you already did, cool 16:36:13 hyc are you trying to change p2pool config? If your consensus id changes, other nodes will disconnect you 16:36:27 I see it in my node log that you disconnect at handshake step 16:37:15 I haven't changed anything 16:37:41 but I suppose the mac would have gotten a different node ID than the rockpro64 16:38:02 how would I set that? 16:39:09 status says I have 4 p2p connections 16:39:43 Hmm, I restarted my node and it connected 16:40:30 NOTICE 2021-08-24 18:37:22.7127 P2PServer peer 84.203.25.127:37890 sent HANDSHAKE_CHALLENGE 16:40:36 followed by an immediate disconnect 16:40:50 can you search the log for my IP around that time? This is CEST time zone 16:41:15 logs should always use UTC ... 16:42:37 just search for any warnings with IP 31.208.56.53 16:43:31 I think I found it 16:43:47 NOTICE 2021-08-24 17:37:00.6790 P2PServer new connection from 31.208.56.53:62106 16:43:47 NOTICE 2021-08-24 17:37:00.6790 P2PServer sending HANDSHAKE_CHALLENGE 16:43:47 NOTICE 2021-08-24 17:37:00.6790 P2PServer peer 31.208.56.53:62106 sent HANDSHAKE_CHALLENGE 16:43:47 WARNING 2021-08-24 17:37:00.6790 P2PServer tried to connect to the same peer twice: current connection 31.208.56.53:62099, new connection 31.208.56.53:62106 16:43:47 NOTICE 2021-08-24 17:37:00.6790 P2PServer peer 31.208.56.53:62106 disconnected 16:43:59 aha 16:44:27 not a problem then, my node just thought it was a different peer on the same IP 16:44:53 it would've disconnected if it got HANDSHAKE_CHALLENGE from your node, but your node found it first and disconnected 16:47:42 so no issue here then 16:48:22 "gmtime may not be thread-safe". If you have better idea how to get UTC time in a thread-safe manner, you can do a PR (log.cpp, Stream::writeCurrentTime()) 16:49:56 There is a gmtime_r IIRC. 16:50:10 Might be a GNU extension though. 16:50:53 not a part of C++ standard 16:52:13 not a big problem though, I already have #ifdef there 16:52:31 Then I guess gettimeofday ? 16:53:28 I'm 99% sure we can just replace localtime_* with gmtime_* in the code and that's it 16:55:55 yeap 16:57:02 yeah they have identical API/semantics 17:02:17 my node is trying to connect to yours twice again for some reason 17:04:02 did it get saved twice in p2p_peers? 17:04:18 I check the list of connected IPs first before making new connection 17:04:26 so it shouldn't happen 17:11:15 hyc I think this is what's happening: my node gets ECONNRESET on connection to your node and then tries to reconnect, but your side of old connection is still connected 17:11:48 WARNING 2021-08-24 17:08:56.1699 P2PServer client: failed to read response, err = ECONNRESET 17:11:48 NOTICE 2021-08-24 17:08:56.1699 P2PServer peer 84.203.25.127:37890 disconnected 17:12:01 and then this loop of handshake-disconnect starts 17:12:07 this is UTC time now 17:12:53 hmm, I probably need to drop connections which weren't alive for some time (no meaningful packets sent) 17:21:57 hyc there's definitely something with connection to your node. My node sometimes connects to yours, but then it doesn't receive updates from it for some time, and then receives a bunch of broadcasts and ECONNRESET shortly after 17:28:55 don't know what changed, but it's fine now 17:29:16 nope, ECONNRESET again 17:45:55 is this normal "2021-08-24 17:44:41.716 W There were 73 blocks in the last 90 minutes, there might be large hash rate changes, or we might be partitioned, cut off from the Monero network or under attack, or your computer's time is off. Or it could be just sheer bad luck." ? 17:46:28 on testnet? 17:46:40 yes, I get the same 17:46:43 it's "large hash rate changes" 17:46:45 yeah testnet 17:46:55 because testnet was at 2 kh/s and now more people are mining it 17:47:09 got it 17:49:40 do you need some hashrate for testing ? or just running p2pool and node 17:53:26 p2pool, node and xmrig mining with 1 CPU thread is enough 17:53:36 hashrate will be needed for mainnet testing 17:53:38 not now 17:58:39 well, this is the M1 mac, which also had connection leak issues in monerod 17:58:52 so it's probably just a flaky build 17:59:06 it doesn't change the fact that other nodes need to handle flaky connections properly 17:59:10 I wonder if I can find newer or older gcc for it 18:00:40 i can donate 1khs of mining power to some p2pool if you provide me ip, but can't run p2pool at the moment 18:01:41 1kh is prob excessive 18:01:51 for testnet 18:02:14 p2pool on testnet is already like 75% of the whole network 18:02:27 good that it can't do 51% because it's decentralized :) 18:02:38 lol 18:03:14 I see 6 unique wallets, ATH \o/ 18:04:08 nice 18:05:33 6 payouts: https://testnet.xmrchain.net/tx/d45d5bbaa73d1301a1813cfeb15089604093a3fd67e142d4a215964617edf3df 18:05:36 i see this from time to time, don't know if it's a problem "P2PServer peer 148.251.81.38:37890 is ahead on mainchain (height 1795263, your height 1795262). Is your monerod stuck or lagging?" 18:06:02 my internet should be good 18:06:04 if it's ahead by 1 block it's not a problem 18:06:15 different nodes receive new Monero block at different times 18:07:18 it can also happen if some peer mines a Monero block and sends p2pool block to you. Then you will see +1 height, because p2pool block almost always gets to you faster than Monero block 18:07:43 I should probably only print this in log if it's height+2 or more 21:04:39 how long does it take for peers to forget my node's consensus ID? should I just turn it off for a while? 21:05:40 this ID is derived from p2pool config, it's the same for everyone 21:06:23 your node IP is forgotten after 10 failed connection attempts IIRC 21:06:40 ok 21:07:40 seems pointless to have so many connection attempts https://paste.debian.net/1208914/ 21:09:05 the reasoning is maybe that node is just restarting, so 10 attempts 21:09:23 https://paste.debian.net/1208915/ 21:09:38 so many redundant connections 21:10:51 from my side it looked like broken connections, I have a lot of ECONNRESET warnings 21:12:50 from my side: https://paste.debian.net/hidden/90507fc5/ 21:13:12 so it saw only 1 disconnect 21:13:42 interesting. perhaps it should log the local addr:port too 21:14:59 my local addr:port wouldn't be what you see because I'm behind not 1 but 2 NATs :D 21:15:07 ah... 21:15:18 even 1 NAT would change it 21:15:23 and yet you were able to forward ports in? 21:15:29 no 21:15:43 nothing works here for incoming connections 21:16:05 ah. sounds like a common home network 21:18:25 interesting that you have a pending connection attempt. should have gotten ECONNREFUSED right away 21:33:45 gonna assume 10 attempts have passed. starting up again 21:34:12 yeap, your node is not in p2pool_peers.txt 21:34:36 cool. I see no redundant conns in lsof now 21:37:26 I see no new warnings in log so far 21:38:13 4.5 kh/s on sidechain, more than network hashrate 21:38:16 somebody ramped up 21:39:36 hmm 21:39:48 maybe I shouldn't make sidechain diff higher than network diff :D 21:40:13 hmm... but it's not higher because 1 second block time. 21:40:18 so not a problem 21:41:47 my monerod has 56 inbound conns. never knew so many nodes were on testnet 21:43:27 10(out)+61(in) connections here 21:43:38 not here, but on my server 22:46:11 btw, my p2pool currently shows 579MB used 22:46:31 monerod only 327M 22:47:31 is this the new build with block cache? 22:48:15 552452 KB on my server 22:48:29 so the same ballpark 22:48:57 looks like yes, has block cache 22:49:13 327d1455fe4f861df82624d4c5a4d6eb4b15a7a7 22:49:21 it's not the latest, but was pulled today 22:49:45 yes, it's with the cache 22:50:15 cache is 480 MB, but I think not all of it is in physical RAM because of how memory mapped files work 22:50:25 likely 22:50:44 well the mac has 32GB RAM and not much happening on it so it could all be resident 22:50:54 5120 blocks * 4 KB (blocks are small now) = 20 MB most likely 22:51:23 Mac M1 has 4 KB pages, right? Or bigger? 22:51:34 16KB 22:51:39 so 80 MB 22:52:12 cache uses 96 KB per block and most of it is never touched now, so it's probably not in RAM 22:52:59 no big deal either way 22:53:51 moneromoooo is it safe to submit block template with enough PoW on a different Monero node? I see that p2pool blocks arrive before Monero blocks (mined by p2pool) quite often 22:54:14 so p2pool could broadcast mined Monero blocks more efficiently 22:58:38 for example block https://testnet.xmrchain.net/block/1795518 - my node received p2pool broadcast of this block 0.1 seconds before it got ZMQ notification from monerod 22:59:33 or maybe it's monerod that took 0.1 seconds to add the block and then send notification 22:59:40 could be 22:59:48 check timestamps in monerod log for block receipt 22:59:51 I'm not sure it's safe to submit the same block concurrently 23:00:16 if it's not safe that'd be a pretty silly vulnerability 23:01:03 which log level in monerod is needed for block receipts? 23:01:19 eh, level 1 ought to be enough. don't recall 23:01:35 I'll wait for next p2pool block then with level 1 23:06:31 confirmed 0.1 second delay: https://paste.debian.net/hidden/09d05249/ 23:06:34 p2pool is faster 23:07:13 this is https://testnet.xmrchain.net/block/1795523 23:07:40 p2pool block id 708f76dc5ffa092798664b4cc5821f1b410607de729d7226d659da064b3ba993 which can be found in tx extra for that block, so it's 100% that block 23:09:02 I'll add submitting of broadcasted blocks tomorrow and we'll see if anything breaks 23:09:42 So p2pool will be able to speed up block propagation in Monero network, wow 23:09:54 cool 23:10:04 but only for blocks it mines 23:13:19 I do not understand that question. 23:14:48 if monerod receives fluffy block and at the same time the same block via RPC, is it safe? 23:14:54 i.e. no crashes 23:15:27 Should be safe. 23:15:31 I assume it should be protected by a lock somewhere 23:15:49 ok 23:17:13 It is. 23:17:41 so a p2pool miner found a block, and broadcasted it to all other p2pool peers 23:17:49 it would also have submitted it to its local monerod 23:18:05 so its local monerod broadcasted the block more slowly than p2pool ? 23:18:12 yes 23:18:37 first, because there were more hops over the Internet 23:18:53 second, because monerod checks PoW in light mode (+10 ms on each hop) 23:19:11 and also does a bunch of other transaction checks before broadcasting it further 23:19:38 you can see in my paste that the block came from a different IP address 23:20:06 sounds ok. so all p2pool miner's monerods can be updated a little bit ahead of rest of network 23:20:46 IIRC supportxmr has this fast broadcast system between its nodes 23:20:55 to reduce orphan rate 23:28:25 clusters of wafer-scale compute ... https://www.wired.com/story/cerebras-chip-cluster-neural-networks-ai/ 23:28:44 GPU-friendly PoW will crumple 23:46:43 p2pool 581M now. so, some slow growth happening. will check again tomorrow