-
hyc
selsta: n/m I'm seeing things.....
-
sech1
selsta moneromooo the pruned p2pool node on my old server with HDD crashed 2 times in the last 2 days (Segmentation fault). Running it under gdb with set_log 1 now. It wasn't up for more than 24 hours since I started using it.
-
sech1
the same binary on my new server has an uptime of 18 days (but it's not pruned and it's on an SSD)
-
sech1
both run p2pool and xmrig, so the only difference is pruned/unpruned and HDD/SSD
-
sech1
the monerod binary that I use is a gitian build made from
monero-project/monero 2243318
-
selsta
sech1: without backtrace it's difficult to tell
-
selsta
hopefully it crashes again
-
sech1
so this binary doesn't have
monero-project/monero #7873
-
sech1
waiting for it to crash again
-
selsta
7873 is quite rare, I would be surprised if it caused a crash twice in 24h
-
sech1
oh, one more difference is my old server runs Ubuntu 16.04 and the new one runs 20.04
-
selsta
.merges
-
xmr-pr
Merge queue empty
-
selsta
.merge+ 8019 8018 8011 8006 8007 8004 8003 8002 7995 7996
-
xmr-pr
Added
-
selsta
.merge+ 8022
-
xmr-pr
Added
-
hyc
sech1 you didn't get a core file?
-
sech1
hyc I didn't get "core dumped" message, just "segmentation fault"
-
hyc
must have corefiles disabled / ulimit set to 0
-
wfaressuissia
sech1 do you have binary that crashed ?
-
wfaressuissia
`Code: 1f 40 00 f3 0f 1e fa e9 67 ff ff ff ......` dmesg on that machine should contain part of asm (within instruction pointer), it can be used to find corresponding place in binary
-
wfaressuissia
and then corresponding function
-
wfaressuissia
s/within .../aronud .../
-
sech1
the binary is from p2pool release
-
wfaressuissia
ok, binary is publicly available
-
wfaressuissia
`monerod[1963487]: segfault at 0 ip 000055736a032137 sp 00007ffc460f53e0 error 6 in monerod[55736a032000+1000]` what's about dmesg ? something like this should be there
-
sech1
Oct 23 06:19:17 sech.me kernel: monerod[16106]: segfault at 0 ip 00005565a3dc703f sp 00007feb4caf3940 error 4 in monerod[5565a3942000+13bd000]
-
wfaressuissia
I need the next line with code
-
sech1
there is no next line
-
sech1
I'm just running "journalctl --since "24 hours ago"" and next line is unrelated to this
-
wfaressuissia
`std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::vector(std::vector<std:`
-
wfaressuissia
yes it's that race condition with read/write of the same `std::vector<std::string>`
-
wfaressuissia
'0x48503f' offset within binary
-
wfaressuissia
`>>> hex(0x00005565a3dc703f - 0x5565a3942000)`
-
hyc
nice work tracing that
-
sech1
so just recompile the latest release and it should be fine?
-
wfaressuissia
likely yes
-
sech1
I'll just wait first until my current gdb+monerod session crashes
-
sech1
same binary works absolutely fine on the other server
-
sech1
so it must be timing issue. I wonder if running under gdb and set_log 1 changes the timings though...
-
hyc
logging probably
-
hyc
gdb maybe
-
wfaressuissia
someone should connect to your node while it's copying txs for relay
-
hyc
once upon a time, uninit'd memory bugs were hard to find under gdb because the process always got zero'd memory
-
wfaressuissia
good luck with waiting for such event
-
sech1
It crashed twice in 2 days, so luck is "good" on that server
-
sech1
maybe running on HDD and pruning increases chances of the crash