#monero-dev

00:01

hyc

selsta: n/m I'm seeing things.....
07:19

sech1

selsta moneromooo the pruned p2pool node on my old server with HDD crashed 2 times in the last 2 days (Segmentation fault). Running it under gdb with set_log 1 now. It wasn't up for more than 24 hours since I started using it.
07:22

sech1

the same binary on my new server has an uptime of 18 days (but it's not pruned and it's on an SSD)
07:23

sech1

both run p2pool and xmrig, so the only difference is pruned/unpruned and HDD/SSD
07:40

sech1

the monerod binary that I use is a gitian build made from monero-project/monero 2243318
07:42

selsta

sech1: without backtrace it's difficult to tell
07:42

selsta

hopefully it crashes again
07:42

sech1

so this binary doesn't have monero-project/monero #7873
07:42

sech1

waiting for it to crash again
07:43

selsta

7873 is quite rare, I would be surprised if it caused a crash twice in 24h
07:45

sech1

oh, one more difference is my old server runs Ubuntu 16.04 and the new one runs 20.04
11:30

selsta

.merges
11:30

xmr-pr

Merge queue empty
11:32

selsta

.merge+ 8019 8018 8011 8006 8007 8004 8003 8002 7995 7996
11:32

xmr-pr

Added
12:01

selsta

.merge+ 8022
12:01

xmr-pr

Added
15:20

hyc

sech1 you didn't get a core file?
16:11

sech1

hyc I didn't get "core dumped" message, just "segmentation fault"
16:13

hyc

must have corefiles disabled / ulimit set to 0
16:22

wfaressuissia

sech1 do you have binary that crashed ?
16:23

wfaressuissia

`Code: 1f 40 00 f3 0f 1e fa e9 67 ff ff ff ......` dmesg on that machine should contain part of asm (within instruction pointer), it can be used to find corresponding place in binary
16:23

wfaressuissia

and then corresponding function
16:23

wfaressuissia

s/within .../aronud .../
16:26

sech1

the binary is from p2pool release
16:27

wfaressuissia

ok, binary is publicly available
16:27

wfaressuissia

`monerod[1963487]: segfault at 0 ip 000055736a032137 sp 00007ffc460f53e0 error 6 in monerod[55736a032000+1000]` what's about dmesg ? something like this should be there
16:28

sech1

Oct 23 06:19:17 sech.me kernel: monerod[16106]: segfault at 0 ip 00005565a3dc703f sp 00007feb4caf3940 error 4 in monerod[5565a3942000+13bd000]
16:28

wfaressuissia

I need the next line with code
16:30

sech1

there is no next line
16:31

sech1

I'm just running "journalctl --since "24 hours ago"" and next line is unrelated to this
16:41

wfaressuissia

`std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::vector(std::vector<std:`
16:41

wfaressuissia

yes it's that race condition with read/write of the same `std::vector<std::string>`
16:42

wfaressuissia

'0x48503f' offset within binary
16:42

wfaressuissia

`>>> hex(0x00005565a3dc703f - 0x5565a3942000)`
16:42

hyc

nice work tracing that
16:46

sech1

so just recompile the latest release and it should be fine?
16:48

wfaressuissia

likely yes
16:50

sech1

I'll just wait first until my current gdb+monerod session crashes
16:50

sech1

same binary works absolutely fine on the other server
16:50

sech1

so it must be timing issue. I wonder if running under gdb and set_log 1 changes the timings though...
16:51

hyc

logging probably
16:51

hyc

gdb maybe
16:52

wfaressuissia

someone should connect to your node while it's copying txs for relay
16:52

hyc

once upon a time, uninit'd memory bugs were hard to find under gdb because the process always got zero'd memory
16:52

wfaressuissia

good luck with waiting for such event
16:55

sech1

It crashed twice in 2 days, so luck is "good" on that server
16:56

sech1

maybe running on HDD and pruning increases chances of the crash

4 years ago

« a day earlier

2 days later »

today »