00:01:14 selsta: n/m I'm seeing things..... 07:19:29 selsta moneromooo the pruned p2pool node on my old server with HDD crashed 2 times in the last 2 days (Segmentation fault). Running it under gdb with set_log 1 now. It wasn't up for more than 24 hours since I started using it. 07:22:35 the same binary on my new server has an uptime of 18 days (but it's not pruned and it's on an SSD) 07:23:05 both run p2pool and xmrig, so the only difference is pruned/unpruned and HDD/SSD 07:40:28 the monerod binary that I use is a gitian build made from https://github.com/monero-project/monero/commit/2243318000b8fa746a4d874ad62984d6ce25d494 07:42:23 sech1: without backtrace it's difficult to tell 07:42:32 hopefully it crashes again 07:42:42 so this binary doesn't have https://github.com/monero-project/monero/pull/7873 07:42:47 waiting for it to crash again 07:43:31 7873 is quite rare, I would be surprised if it caused a crash twice in 24h 07:45:46 oh, one more difference is my old server runs Ubuntu 16.04 and the new one runs 20.04 11:30:59 .merges 11:30:59 Merge queue empty 11:32:47 .merge+ 8019 8018 8011 8006 8007 8004 8003 8002 7995 7996 11:32:47 Added 12:01:56 .merge+ 8022 12:01:56 Added 15:20:20 sech1 you didn't get a core file? 16:11:56 hyc I didn't get "core dumped" message, just "segmentation fault" 16:13:23 must have corefiles disabled / ulimit set to 0 16:22:07 sech1 do you have binary that crashed ? 16:23:10 `Code: 1f 40 00 f3 0f 1e fa e9 67 ff ff ff ......` dmesg on that machine should contain part of asm (within instruction pointer), it can be used to find corresponding place in binary 16:23:18 and then corresponding function 16:23:57 s/within .../aronud .../ 16:26:14 the binary is from p2pool release 16:27:18 ok, binary is publicly available 16:27:35 `monerod[1963487]: segfault at 0 ip 000055736a032137 sp 00007ffc460f53e0 error 6 in monerod[55736a032000+1000]` what's about dmesg ? something like this should be there 16:28:20 Oct 23 06:19:17 sech.me kernel: monerod[16106]: segfault at 0 ip 00005565a3dc703f sp 00007feb4caf3940 error 4 in monerod[5565a3942000+13bd000] 16:28:32 I need the next line with code 16:30:39 there is no next line 16:31:01 I'm just running "journalctl --since "24 hours ago"" and next line is unrelated to this 16:41:26 `std::vector, std::allocator >, std::allocator, std::allocator > > >::vector(std::vector yes it's that race condition with read/write of the same `std::vector` 16:42:00 '0x48503f' offset within binary 16:42:04 `>>> hex(0x00005565a3dc703f - 0x5565a3942000)` 16:42:50 nice work tracing that 16:46:23 so just recompile the latest release and it should be fine? 16:48:52 likely yes 16:50:04 I'll just wait first until my current gdb+monerod session crashes 16:50:22 same binary works absolutely fine on the other server 16:50:42 so it must be timing issue. I wonder if running under gdb and set_log 1 changes the timings though... 16:51:53 logging probably 16:51:58 gdb maybe 16:52:47 someone should connect to your node while it's copying txs for relay 16:52:51 once upon a time, uninit'd memory bugs were hard to find under gdb because the process always got zero'd memory 16:52:52 good luck with waiting for such event 16:55:47 It crashed twice in 2 days, so luck is "good" on that server 16:56:41 maybe running on HDD and pruning increases chances of the crash