#monero-dev

01:13

selsta

hyc: i found the issue, randomx tests are failing
01:14

selsta

[84] Hash test 2a (compiler) ... Assertion failed: (equalsHex(hash, "639183aae1bf4c9a35884cb46b09cad9175f04efd7684e7262a0ac1c2f0b4e3f")), function operator(), file tests.cpp, line 966.
01:15

selsta

sech1: ^^
01:25

hyc

selsta I've got macos 12.0.1
01:26

selsta

i installed 13.0 so that i can test if everything monero related works, and this randomx issue showed up
01:26

selsta

is there an env var to disable randomx jit?
01:26

hyc

yes\
01:27

hyc

MONERO_RANDOMX_UMASK
01:27

selsta

MONERO_RANDOMX_UMASK=8
01:27

selsta

that's what i found
01:28

hyc

so jit is broken on macos 13?
01:28

selsta

yep
01:28

selsta

it's beta so maybe macOS itself is broken but so far everything else works
01:28

hyc

try doing make test in the randomx source tree?
01:28

selsta

that's where i got test 84 failing
01:29

selsta

i get a different hash every single time i run ./randomx-tests
01:30

hyc

that's not good....
01:31

selsta

paste.debian.net/hidden/2e47bd45
01:31

selsta

2a sometimes passes but then it fails at 2b, other times it fails at 2a directly, it's inconsistent
01:32

hyc

try editing src/virtual_memory.cpp and make sure USE_PTHREAD_JIT_WP stays undefined
01:40

selsta

same issue
01:42

hyc

I wonder if CPU cache invalidation isn't happening.
01:50

hyc

might try explicitly calling sys_icache_invalidate() instead of __builtin__clear_cache
01:50

hyc

iains/gcc-darwin-arm64 5889b35
01:50

hyc

I feel like we've had this conversation before...
01:50

selsta

checking
01:52

hyc

then again, gcc should automatically be emitting that when __builtin__clear_cache is used
01:52

hyc

but still, worth a try
01:55

hyc

yeah I looked up this same question november 2020
01:55

hyc

nov 29 2020
01:57

hyc

xmrig calls it explicitly instead of the __builtin so it's worth a try anyway
02:09

selsta

didn't help
02:11

hyc

probably will need sech1 to get onto a Mac with OS13 to debug. sounds like an OS bug tho
02:19

selsta

I'll try to get it set up
02:20

selsta

i'll also check if xmrig has the same issue
02:21

hyc

does xmrig --bench=1M return the right hash?
02:21

hyc

you might need to also specify a --seed
02:22

hyc

"right hash" == same hash sum each time
02:23

selsta

fish: Job 1, './xmrig --bench=1M --seed "test"' terminated by signal SIGSEGV (Address boundary error)
02:23

hyc

got enough RAM?
02:23

selsta

32GB
02:24

hyc

no idea what tripped there
02:29

hyc

hm, bench defaults to seed of all zero, so try without --seed
02:29

selsta

same
02:29

selsta

I'm starting to wonder if I should just install the old OS again
02:29

hyc

and never had this problem on os12?
02:29

selsta

no
02:29

hyc

sounds like it
02:30

selsta

monero wallet sync also crashes
02:30

selsta

use after free or address boundary error
02:31

hyc

can't believe a use after free wouldn't be caught on every other os too
02:31

selsta

monero-wallet-cli(29852,0x16eaaf000) malloc: Incorrect checksum for freed object 0x137464610: probably modified after being freed.
02:31

selsta

Corrupt value: 0x0
02:32

hyc

... or a heap overrun
02:32

hyc

could run wallet sync on x86-64 with valgrind
02:32

hyc

if it always dies in same place, can compare stack traces
02:34

selsta

doesn't always die in the same place
02:34

selsta

it's... super weird
02:35

hyc

but par for the course... along with the random connection drops and other crap
02:37

hyc

hard to imagine how they can screw up a BSD based OS so badly
02:38

selsta

the weird thing is apart from monero and randomx i didn't find any issues yet, web browsers also work fine (i assume they use jit)
02:39

hyc

hm, probably, yeah
02:40

selsta

trying to use the binaries generated with depends now
02:40

selsta

in case it's some compiler error, no idea
02:41

hyc

hm, the depends build will use the OSX 11.0 SDK and its compiler
02:41

selsta

ok same
02:41

hyc

and that binary works on OS 12
02:42

hyc

so I don't see a compiler bug being likely
02:45

selsta

so i'll rollback my laptop and try to install macOS 13 on that macmini we have
05:36

sech1

selsta it really looks like cache invalidation problem. Could you run xmrig under gdb and get a callstack?
08:23

hyc

i don't think gdb really works on macos. prob need to use lldb
09:14

tobtoht[m]

<selsta> "we don't use lmdb for the wallet..." <- only ringdb uses lmdb
15:48

TrasherDK[m]

Is it normal for zmq-pub to miss publishing mem-pool tx ?
15:54

sech1

That's unusual
15:54

TrasherDK[m]

The receiving script: github.com/trasherdk/node-test-snip…aster/monero/src/node-zmq-grampy.js
15:57

sech1

It shouldn't miss publishing, but keep in mind that it doesn't dump everything in mempool when you first connect. It only publishes new transactions
15:59

TrasherDK[m]

On stagenet, sender show txid:6226f5d745bfaa158a70ba7b3e0d8b41e71e40b3cafb9449a0e839a45d3981eb
16:01

TrasherDK[m]

receiver show both mem-pool: 2022-07-07 15:03:42 0.500000000000 6226f5d745bfaa158a70ba7b3e0d8b41e71e40b3cafb9449a0e839a45d3981eb 0000000000000000 0.000000000000 57Es8x:0.500000000000 0
16:02

TrasherDK[m]

and receiving: Height 1130735, txid <6226f5d745bfaa158a70ba7b3e0d8b41e71e40b3cafb9449a0e839a45d3981eb>, 0.500000000000, idx 0/0
16:02

TrasherDK[m]

But nothing received from zmq-pub
16:05

sech1

did you submit this tx to your node? zmq-pub might skip publishing transactions which are in dandelion stem phase
16:06

TrasherDK[m]

Yes. My node and both wallets are on the same host.
16:07

sech1

it definitely filters something: github.com/monero-project/monero/bl…note_core/cryptonote_core.cpp#L1107
16:07

TrasherDK[m]

The Node script is on a different host.
16:07

sech1

relay_category::legacy, not sure which transactions it is
16:08

sech1

yes, it matches only for dandelion fluff transactions
16:09

sech1

so if the node adds transaction to mempool during stem phase, it's not sent vid zmq-pub
16:09

sech1

*via
16:09

sech1

which makes sense because this is an interface for miners (dandelion transactions should be mined only after stem phase has finished)
16:10

sech1

but it should still publish them after they switch to fluff
16:10

sech1

but it doesn't and it can be considered a bug
16:11

TrasherDK[m]

Nah. It never arrived on the sup end. How can I ensure I get all tx's ?
16:11

sech1

you can remove "matches_category(tx_relay, relay_category::legacy)" from that line if you don't mind rebuilding monerod
16:12

selsta

sech1: run xmrig benchmark under lldb?
16:12

TrasherDK[m]

The wallet confirm prompt was probably more than a minute after transfer command
16:13

sech1

a minute sounds like Dandelion++ delay
16:14

sech1

selsta if you can, yes
16:15

selsta

TrasherDK[m]: did you disable dns?
16:16

TrasherDK[m]

disable-dns-checkpoints=1
16:16

TrasherDK[m]

enable-dns-blocklist=1
16:17

selsta

I meant wallet --no-dns
16:17

TrasherDK[m]

Probably not. Checking...
16:17

selsta

that was one thing that caused slow tx generation but not sure if it's related to what you are writong
16:17

selsta

writing
16:37

TrasherDK[m]

Okay, dns disabled and sessions restarted. Let's see how that goes.
16:56

TrasherDK[m]

This one never came on zmq-sup: 8fa62730c2c82cc45703471b085c46e771ebc244c98287497857b4bf82685f75
16:56

TrasherDK[m]

No show: cbd114c486f60f8ad7532a9e0327c9a86a16217799d6e5bb4c9d002131f25d00
16:57

TrasherDK[m]

No show: 6226f5d745bfaa158a70ba7b3e0d8b41e71e40b3cafb9449a0e839a45d3981eb
17:07

selsta

sech1: paste.debian.net/hidden/5f75873c
17:08

selsta

does not seem useful
17:10

sech1

quite useful actually
17:10

sech1

or not
17:10

sech1

one thread is in hashAndFillAes1Rx4 but it's not the thread that crashed
17:11

sech1

it's probably the same cache flushing problem
17:11

sech1

code cache flushing
17:14

sech1

it definitely crashed in JIT generated code, in one of RandomX FP instructions
17:52

kayabanerve[m]

`libcryptonote_basic.a(cryptonote_format_utils.cpp.o): in function cryptonote::get_pruned_transaction_hash: undefined reference to cryptonote::get_transaction_prefix_hash(cryptonote::transaction_prefix const&, crypto::hash&)`
17:52

kayabanerve[m]

How did I get an undefined reference in a lib to something defined in the same lib?
17:52

kayabaNerve

I just need bp_prove and hash_to_curve and I've now linked ~11 libs I don't want just trying to get this to compile. I think this is my last error
17:53

sech1

it can happen if that function is inlined by compiler
17:53

kayabaNerve

... so how do I get it to *not* be inlined?
17:53

kayabaNerve

I just ran make. Didn't do any CMAKE config
17:54

kayabaNerve

Though I am also looking for a CMAKE to disable all optional depends. I have hidapi rn and I do not want it
17:54

sech1

hmm, actually there's no such function
17:54

sech1

"cryptonote::transaction_prefix const&, crypto::hash&" with this specific signature
17:54

sech1

there is one with "(const transaction_prefix& tx, crypto::hash& h, hw::device &hwdev)" signature
17:55

sech1

ah, it's in cryptonote_format_utils_basic.cpp
17:55

kayabaNerve

Yep. Which is still in cn_basic AFAICT
17:55

kayabaNerve

I also have undefined reference *from* device
17:56

sech1

it's in cryptonote_format_utils_basic library
17:57

kayabaNerve

... ah
17:57

sech1

see src/cryptonote_basic/CMakeLists.txt
18:06

kayabanerve[m]

Thanks :)
18:25

jberman[m]

I can repro TrasherDK 's issue when my daemon is in the fluff epoch, and when stem isn't working. Trasher do you see this error in your daemon: `Unable to send transaction(s) via Dandelion++ stem`?
18:30

jberman[m]

In fluff: my tx passes through my node with `tx_relay::local` so it doesn't get pushed out over zmq, then `on_transactions_relayed` in `dandelionpp_notify` will default upgrade it to "fluff" without pushing it out over zmq
18:30

jberman[m]

When stem is working: my tx passes through my node with `tx_relay::stem`, then once my node sees it in the network from another node, the tx gets re-added to my node's tx pool and pushed over zmq because `already_have` in `handle_incoming_txs` is false
23:46

TrasherDK[m]

jberman: I have only 2 different log messages:
23:46

TrasherDK[m]

I background mining is enabled,
23:46

TrasherDK[m]

I Found block <6a9e00293490fa5d
23:50

jberman[m]

every time you've ever tried submitting a tx through your own node, you don't see the tx in zmq? It's not a sporadic thing?
23:51

TrasherDK[m]

It's like 2-3 transfers is OK, 1 is not, then 2-3 OK and so on.
23:52

TrasherDK[m]

I'm at default log_level, maybe a higher would help?
23:55

jberman[m]

can you try compiling with `const bool fluffing = false;` and seeing if they come through 100% of the time? github.com/monero-project/monero/bl…note_protocol/levin_notify.cpp#L701
23:56

jberman[m]

`const bool fluffing = false;` the txs should always show up in zmq.. `const bool fluffing = true;` the txs should never show up
23:57

TrasherDK[m]

I'll give it a try.

3 years ago

« a day earlier

a day later »

today »