#monero-pow

06:28

sech1

whoever is running p2pool at 142.202.205.118, check your monerod, it's probably stuck
06:30

sech1

memory leak check: yesterday it was 3469620/37780/10096 (VIRT/RES/SHR), today 3470972/52032/10096
06:30

sech1

hyc I had the same problem with zmq_reader - it doesn't wake up until it gets a message, so I send a fake message on shutdown
06:57

sech1

Node running at 142.202.205.118 is constantly 1 Monero block behind for whatever reason, so other nodes reject its p2pool blocks. We have a lot of orphans from that node because of that. But it's all orphans from that node
07:02

sech1

not constantly, but it's syncing with Monero chain so slowly, literally 1-2 minutes later than other nodes
07:24

Inge

would e.g. it beign an rPi with software AES possibly explain such lag?
07:35

sech1

might be
07:35

sech1

my node has 0 orphans, so it's not my problem :D
07:36

sech1

rPi with slow HDD probably :D
07:37

sech1

I need to think about banning such nodes still. My node sends full unpruned blocks to it every time because it's so far behind. This is wasteful for the bandwidth.
07:55

sech1

that lagging node is not an rPi, it's crowncloud VPS
07:56

sech1

probably something like 2 vCPUs and 1 GB RAM
07:57

sech1

and they do have HDD servers there, not only SSD
08:01

sech1

31 orphans, and it's only those that came from non-lagging Monero blocks: paste.debian.net/hidden/11ddf392
08:01

sech1

I bet that node has thousands of orphans
08:01

sech1

Even my shares have 3% uncles, it was 1% yesterday. Lagging nodes are bad
08:02

sech1

But it's like the worst possible condition because p2pool runs at 1 second block time and that node is literally on the other side of the globe, relative to mine
08:04

sech1

Uncle blocks are worth 20% less with default settings
08:10

CzarekNakamoto[m

<sech1> "I need to think about banning..." <- Banning like just for a hour or something only for you or globally? Also does p2pool require full blockchain to be downloaded or it can be pruned blockchain/public node? Also isn't 1 second too small? If somebody have ping larger than 800ms to another note it will create an insane lag (and if I understand that correctly, it can cause a softwork)
08:10

CzarekNakamoto[m

> <@sech1:libera.chat> I need to think about banning such nodes still. My node sends full unpruned blocks to it every time because it's so far behind. This is wasteful for the bandwidth.
08:10

CzarekNakamoto[m

* Banning like just for a hour or something only for you or globally? Also does p2pool require full blockchain to be downloaded or it can be pruned blockchain/public node? Also isn't 1 second too small? If somebody have ping larger than 800ms to another note it will create an insane lag (and if I understand that correctly, it can cause a softfork)
08:15

sech1

Please don't edit messages, it spam IRC side
08:15

sech1

Banning locally. We run with 1 second block time to run a stress-test
08:15

sech1

Pruned Monero node is ok
08:17

sech1

High ping by itself shouldn't be a problem, uncle blocks fix this in 99% cases. But that node is probably a low-tier VPS running on an HDD. Monero is already heavy on HDD, and if some neighbour on that VPS does something heavy too, it will lag - and it does
08:20

sech1

Other p2pool nodes don't tolerate when someone sends p2pool block build on top of an old Monero block because it useless for Monero side. So they just ignore such blocks and the node that sent them will get orphans
08:20

sech1

But they don't ban such node (yet). I need to think of when to ban these nodes without making too many false positives
09:16

CzarekNakamoto[m

Okay, I think that I get it now, thanks sech1!
10:36

hyc

sech1: uv_async_send() is the fake message to wake up the event loop
11:00

sech1

hyc I also submitted my block cache stuff and restarted p2pool node at 148.251.81.38 using the filled cache
11:00

sech1

let's see if anything breaks
11:01

hyc

cool
11:06

sech1

cache is just a memory mapped file that's flushed in the background. When p2pool loads it, it uses the same code path as external downloaded blocks, so it still does all checks
11:07

sech1

but it should work fine with corrupt cache
11:08

sech1

initial sync with cache will be limited by CPU - mostly how fast it can calculate PoW
11:08

sech1

10-20 seconds on modern CPU without this much logging
11:11

sech1

hmm, if cache gets corrupt it will ban whatever client it's syncing from when it gets to a corrupt p2pool block :D
11:16

hyc

prob should see that it's using the cache and skip that banning step ;)
11:18

sech1

yeah
11:28

moneromoooo

Why do blocks with bad PoW get into the cache in the first place ?
11:29

sech1

they don't
11:29

sech1

but cache is on the disc and it's used to skip network requests for blocks on startup
11:29

sech1

and it can get corrupt in uncountable ways :)
11:30

moneromoooo

Aleph zero corruptions on the disk, aleph zero corruptions... Take one and fix it...
11:31

sech1

actually block cache was supposed to have a keccak hash check before using the loaded block in the first place
11:31

sech1

But I forgot it :D
11:32

sech1

with this check random corruptions will be detected, but maliciously inserted blocks... On the other hand, it's a local file and if someone does it, system is already compromised
11:34

sech1

lol SChernykh/p2pool 4837b4c
11:34

sech1

now it checks that's everything is good, from binary format to keccak hash
11:39

sech1

oh, that west-coast VPS is no longer running, 0 orphans again
11:39

sech1

btw we passed 100k blocks mark \o/
11:42

Inge

so the miner connects to his own local p2pool instance?
11:43

sech1

yes
11:43

sech1

it can run on a different machine, it's just a stratum server
11:51

Inge

and currently testnet for a while longer to iron out wrinkles?
11:53

sech1

yes
11:53

sech1

probably quite a while
11:53

sech1

we still need to test RandomX epoch change - how nodes handle it, how syncing for a new node works when epoch change is in PPLNS window etc
11:53

sech1

it's every 2048 Monero blocks
11:54

sech1

next epoch change on testnet is in ~2 days
11:57

sech1

I run p2pool debug build (with address sanitizer) on my dev pc and it hasn't crashed, so it's quite stable. I'm more worried about sidechain syncing bugs
11:58

sech1

soft forks, net splits and so on
12:03

sech1

moneromoooo did you stop your node? I see only 2 wallets (both mine) in PPLNS window
12:05

sech1

no, I see 159.48.53.164 is still connected, maybe only xmrig stopped there
12:06

moneromoooo

p2pool running, xmrig temporarily stopped.
12:06

sech1

ok
14:15

hyc

oh, mu p2pool died yesterday, didn't notice
14:16

hyc

and i didn't have cores enabled, dang
14:16

hyc

unlimited now\
14:41

sech1

died on which box?
14:42

sech1

I haven't got any crashes so far.
14:42

sech1

it's probably OOM killer again
14:51

moneromoooo

dmesg usually records OOM kills.
14:54

sech1

My node just banned 142.202.205.118 (super-lagging west-coast node). It broadcasted a block that lagged so far it couldn't be verified by my node
14:58

sech1

it was also mining at Monero height 1795092 while my node was already at 1795123, 1 hour behind
14:58

sech1

This is why you don't run heavy stuff on a cheap VPS with HDD
14:59

sech1

actually both my nodes banned it at the same time and for the same reason
15:08

hyc

yeah, dmesg shows it was an oom kill
15:09

hyc

too bad it doesn't work on my mac
15:12

hyc

trying it as a debug build now
15:30

hyc

sech1: this is pretty consistent on mac paste.debian.net/1208872
15:31

sech1

I'll try to figure out why
15:31

sech1

hyc your node keeps asking for block id 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270
15:31

sech1

can you grep your logs for it?
15:31

sech1

because this block id doesn't exist on my node
15:32

hyc

hm, that's not in the log on my mac
15:32

sech1

it comes from 84.203.25.127
15:33

hyc

ah yes it's all over my rockpro64 log
15:33

sech1

my node mined it while it was syncing, so it has sidechain height 10 or something
15:33

sech1

then I restarted it so now my node doesn't know this block
15:33

hyc

NOTICE 2021-08-24 16:20:02.6860 P2PServer sending BLOCK_REQUEST for id = 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270
15:33

hyc

NOTICE 2021-08-24 16:20:02.7395 P2PServer block 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270 was received before, skipping it
15:33

sech1

the thing is, this block was broadcasted to your node so it should have it
15:33

hyc

mostly it's BLOCK_REQUESTS
15:34

sech1

damn I think now every node wants this block :D
15:34

hyc

well, this IP address is my router, the actual node could have been either the mac or the rockpro64 depending on which was runinng at the time
15:34

sech1

and it whatever block references it doesn't get pruned for some reason
15:35

sech1

just restart both your nodes and let's see if it forgets about this block
15:36

hyc

ok
15:36

sech1

I need to tweak pruning logic to deal with these border cases
15:36

hyc

hm the rockpro64 was already restarted due to oom
15:36

moneromoooo

I have 138 messages in 20 minutes of this: P2PServer got a request for block with id 850a4148f33ba689d4fdf74c8d638a885cb66562dd72c9fd80be988381815270 but couldn't find it
15:36

moneromoooo

Otherwise seems to work fine.
15:36

sech1

it's harmless
15:37

sech1

but it creates some spam on the network, so it's better to fix it
15:37

hyc

well I'm keeping the rockpro64 off now since it doesn't have enough RAM
15:41

hyc

so this is where the mac aborts
15:41

hyc

106 // b.m_lock should already be locked here, but try to lock it for reading anyway
15:41

hyc

-> 107 const int lock_result = uv_rwlock_tryrdlock(&b.m_lock);
15:42

hyc

perhaps tryrdlock doesn't like it if you already have a readlock. sounds bizarre tho
15:42

moneromoooo

Default mutex on some platforms is recursive, on some it's not. Might be related.
15:47

sech1

it's not a default mutex, it's read-write lock
15:51

sech1

but actually this is what might be happening there
15:52

sech1

hyc can you try to just remove that lock? I think it's not needed and the comment above is correct
15:52

sech1

it's already locked at this point
15:56

sech1

yes, I just double checked. It's called only from one place and it's locked there
15:56

hyc

ok I'll try that
16:04

hyc

definitely got past that now
16:06

hyc

should I PR this too?
16:09

hyc

ah I see you already did, cool
16:36

sech1

hyc are you trying to change p2pool config? If your consensus id changes, other nodes will disconnect you
16:36

sech1

I see it in my node log that you disconnect at handshake step
16:37

hyc

I haven't changed anything
16:37

hyc

but I suppose the mac would have gotten a different node ID than the rockpro64
16:38

hyc

how would I set that?
16:39

hyc

status says I have 4 p2p connections
16:39

sech1

Hmm, I restarted my node and it connected
16:40

sech1

NOTICE 2021-08-24 18:37:22.7127 P2PServer peer 84.203.25.127:37890 sent HANDSHAKE_CHALLENGE
16:40

sech1

followed by an immediate disconnect
16:40

sech1

can you search the log for my IP around that time? This is CEST time zone
16:41

hyc

logs should always use UTC ...
16:42

sech1

just search for any warnings with IP 31.208.56.53
16:43

hyc

I think I found it
16:43

hyc

NOTICE 2021-08-24 17:37:00.6790 P2PServer new connection from 31.208.56.53:62106
16:43

hyc

NOTICE 2021-08-24 17:37:00.6790 P2PServer sending HANDSHAKE_CHALLENGE
16:43

hyc

NOTICE 2021-08-24 17:37:00.6790 P2PServer peer 31.208.56.53:62106 sent HANDSHAKE_CHALLENGE
16:43

hyc

WARNING 2021-08-24 17:37:00.6790 P2PServer tried to connect to the same peer twice: current connection 31.208.56.53:62099, new connection 31.208.56.53:62106
16:43

hyc

NOTICE 2021-08-24 17:37:00.6790 P2PServer peer 31.208.56.53:62106 disconnected
16:43

sech1

aha
16:44

sech1

not a problem then, my node just thought it was a different peer on the same IP
16:44

sech1

it would've disconnected if it got HANDSHAKE_CHALLENGE from your node, but your node found it first and disconnected
16:47

hyc

so no issue here then
16:48

sech1

"gmtime may not be thread-safe". If you have better idea how to get UTC time in a thread-safe manner, you can do a PR (log.cpp, Stream::writeCurrentTime())
16:49

moneromoooo

There is a gmtime_r IIRC.
16:50

moneromoooo

Might be a GNU extension though.
16:50

sech1

not a part of C++ standard
16:52

sech1

not a big problem though, I already have #ifdef there
16:52

moneromoooo

Then I guess gettimeofday ?
16:53

sech1

I'm 99% sure we can just replace localtime_* with gmtime_* in the code and that's it
16:55

sech1

yeap
16:57

hyc

yeah they have identical API/semantics
17:02

sech1

my node is trying to connect to yours twice again for some reason
17:04

hyc

did it get saved twice in p2p_peers?
17:04

sech1

I check the list of connected IPs first before making new connection
17:04

sech1

so it shouldn't happen
17:11

sech1

hyc I think this is what's happening: my node gets ECONNRESET on connection to your node and then tries to reconnect, but your side of old connection is still connected
17:11

sech1

WARNING 2021-08-24 17:08:56.1699 P2PServer client: failed to read response, err = ECONNRESET
17:11

sech1

NOTICE 2021-08-24 17:08:56.1699 P2PServer peer 84.203.25.127:37890 disconnected
17:12

sech1

and then this loop of handshake-disconnect starts
17:12

sech1

this is UTC time now
17:12

sech1

hmm, I probably need to drop connections which weren't alive for some time (no meaningful packets sent)
17:21

sech1

hyc there's definitely something with connection to your node. My node sometimes connects to yours, but then it doesn't receive updates from it for some time, and then receives a bunch of broadcasts and ECONNRESET shortly after
17:28

sech1

don't know what changed, but it's fine now
17:29

sech1

nope, ECONNRESET again
17:45

aypro

is this normal "2021-08-24 17:44:41.716 W There were 73 blocks in the last 90 minutes, there might be large hash rate changes, or we might be partitioned, cut off from the Monero network or under attack, or your computer's time is off. Or it could be just sheer bad luck." ?
17:46

selsta

on testnet?
17:46

sech1

yes, I get the same
17:46

sech1

it's "large hash rate changes"
17:46

aypro

yeah testnet
17:46

sech1

because testnet was at 2 kh/s and now more people are mining it
17:47

aypro

got it
17:49

aypro

do you need some hashrate for testing ? or just running p2pool and node
17:53

sech1

p2pool, node and xmrig mining with 1 CPU thread is enough
17:53

sech1

hashrate will be needed for mainnet testing
17:53

sech1

not now
17:58

hyc

well, this is the M1 mac, which also had connection leak issues in monerod
17:58

hyc

so it's probably just a flaky build
17:59

sech1

it doesn't change the fact that other nodes need to handle flaky connections properly
17:59

hyc

I wonder if I can find newer or older gcc for it
18:00

CzarekNakamoto[m

i can donate 1khs of mining power to some p2pool if you provide me ip, but can't run p2pool at the moment
18:01

hyc

1kh is prob excessive
18:01

hyc

for testnet
18:02

sech1

p2pool on testnet is already like 75% of the whole network
18:02

sech1

good that it can't do 51% because it's decentralized :)
18:02

hyc

lol
18:03

sech1

I see 6 unique wallets, ATH \o/
18:04

aypro

nice
18:05

sech1

6 payouts: testnet.xmrchain.net/tx/d45d5bbaa73…9604093a3fd67e142d4a215964617edf3df
18:05

aypro

i see this from time to time, don't know if it's a problem "P2PServer peer 148.251.81.38:37890 is ahead on mainchain (height 1795263, your height 1795262). Is your monerod stuck or lagging?"
18:06

aypro

my internet should be good
18:06

sech1

if it's ahead by 1 block it's not a problem
18:06

sech1

different nodes receive new Monero block at different times
18:07

sech1

it can also happen if some peer mines a Monero block and sends p2pool block to you. Then you will see +1 height, because p2pool block almost always gets to you faster than Monero block
18:07

sech1

I should probably only print this in log if it's height+2 or more
21:04

hyc

how long does it take for peers to forget my node's consensus ID? should I just turn it off for a while?
21:05

sech1

this ID is derived from p2pool config, it's the same for everyone
21:06

sech1

your node IP is forgotten after 10 failed connection attempts IIRC
21:06

hyc

ok
21:07

hyc

seems pointless to have so many connection attempts paste.debian.net/1208914
21:09

sech1

the reasoning is maybe that node is just restarting, so 10 attempts
21:09

hyc

paste.debian.net/1208915
21:09

hyc

so many redundant connections
21:10

sech1

from my side it looked like broken connections, I have a lot of ECONNRESET warnings
21:12

sech1

from my side: paste.debian.net/hidden/90507fc5
21:13

sech1

so it saw only 1 disconnect
21:13

hyc

interesting. perhaps it should log the local addr:port too
21:14

sech1

my local addr:port wouldn't be what you see because I'm behind not 1 but 2 NATs :D
21:15

hyc

ah...
21:15

sech1

even 1 NAT would change it
21:15

hyc

and yet you were able to forward ports in?
21:15

sech1

no
21:15

sech1

nothing works here for incoming connections
21:16

hyc

ah. sounds like a common home network
21:18

hyc

interesting that you have a pending connection attempt. should have gotten ECONNREFUSED right away
21:33

hyc

gonna assume 10 attempts have passed. starting up again
21:34

sech1

yeap, your node is not in p2pool_peers.txt
21:34

hyc

cool. I see no redundant conns in lsof now
21:37

sech1

I see no new warnings in log so far
21:38

sech1

4.5 kh/s on sidechain, more than network hashrate
21:38

sech1

somebody ramped up
21:39

sech1

hmm
21:39

sech1

maybe I shouldn't make sidechain diff higher than network diff :D
21:40

sech1

hmm... but it's not higher because 1 second block time.
21:40

sech1

so not a problem
21:41

hyc

my monerod has 56 inbound conns. never knew so many nodes were on testnet
21:43

sech1

10(out)+61(in) connections here
21:43

sech1

not here, but on my server
22:46

hyc

btw, my p2pool currently shows 579MB used
22:46

hyc

monerod only 327M
22:47

sech1

is this the new build with block cache?
22:48

sech1

552452 KB on my server
22:48

sech1

so the same ballpark
22:48

hyc

looks like yes, has block cache
22:49

hyc

327d1455fe4f861df82624d4c5a4d6eb4b15a7a7
22:49

hyc

it's not the latest, but was pulled today
22:49

sech1

yes, it's with the cache
22:50

sech1

cache is 480 MB, but I think not all of it is in physical RAM because of how memory mapped files work
22:50

hyc

likely
22:50

hyc

well the mac has 32GB RAM and not much happening on it so it could all be resident
22:50

sech1

5120 blocks * 4 KB (blocks are small now) = 20 MB most likely
22:51

sech1

Mac M1 has 4 KB pages, right? Or bigger?
22:51

hyc

16KB
22:51

sech1

so 80 MB
22:52

sech1

cache uses 96 KB per block and most of it is never touched now, so it's probably not in RAM
22:52

hyc

no big deal either way
22:53

sech1

moneromoooo is it safe to submit block template with enough PoW on a different Monero node? I see that p2pool blocks arrive before Monero blocks (mined by p2pool) quite often
22:54

sech1

so p2pool could broadcast mined Monero blocks more efficiently
22:58

sech1

for example block testnet.xmrchain.net/block/1795518 - my node received p2pool broadcast of this block 0.1 seconds before it got ZMQ notification from monerod
22:59

sech1

or maybe it's monerod that took 0.1 seconds to add the block and then send notification
22:59

hyc

could be
22:59

hyc

check timestamps in monerod log for block receipt
22:59

sech1

I'm not sure it's safe to submit the same block concurrently
23:00

hyc

if it's not safe that'd be a pretty silly vulnerability
23:01

sech1

which log level in monerod is needed for block receipts?
23:01

hyc

eh, level 1 ought to be enough. don't recall
23:01

sech1

I'll wait for next p2pool block then with level 1
23:06

sech1

confirmed 0.1 second delay: paste.debian.net/hidden/09d05249
23:06

sech1

p2pool is faster
23:07

sech1

this is testnet.xmrchain.net/block/1795523
23:07

sech1

p2pool block id 708f76dc5ffa092798664b4cc5821f1b410607de729d7226d659da064b3ba993 which can be found in tx extra for that block, so it's 100% that block
23:09

sech1

I'll add submitting of broadcasted blocks tomorrow and we'll see if anything breaks
23:09

sech1

So p2pool will be able to speed up block propagation in Monero network, wow
23:09

hyc

cool
23:10

sech1

but only for blocks it mines
23:13

moneromoooo

I do not understand that question.
23:14

sech1

if monerod receives fluffy block and at the same time the same block via RPC, is it safe?
23:14

sech1

i.e. no crashes
23:15

moneromoooo

Should be safe.
23:15

sech1

I assume it should be protected by a lock somewhere
23:15

sech1

ok
23:17

moneromoooo

It is.
23:17

hyc

so a p2pool miner found a block, and broadcasted it to all other p2pool peers
23:17

hyc

it would also have submitted it to its local monerod
23:18

hyc

so its local monerod broadcasted the block more slowly than p2pool ?
23:18

sech1

yes
23:18

sech1

first, because there were more hops over the Internet
23:18

sech1

second, because monerod checks PoW in light mode (+10 ms on each hop)
23:19

sech1

and also does a bunch of other transaction checks before broadcasting it further
23:19

sech1

you can see in my paste that the block came from a different IP address
23:20

hyc

sounds ok. so all p2pool miner's monerods can be updated a little bit ahead of rest of network
23:20

sech1

IIRC supportxmr has this fast broadcast system between its nodes
23:20

sech1

to reduce orphan rate
23:28

hyc

clusters of wafer-scale compute ... wired.com/story/cerebras-chip-cluster-neural-networks-ai
23:28

hyc

GPU-friendly PoW will crumple
23:46

hyc

p2pool 581M now. so, some slow growth happening. will check again tomorrow

4 years ago

« a day earlier

a day later »

today »