01:13:06 hyc: i found the issue, randomx tests are failing 01:14:13 [84] Hash test 2a (compiler) ... Assertion failed: (equalsHex(hash, "639183aae1bf4c9a35884cb46b09cad9175f04efd7684e7262a0ac1c2f0b4e3f")), function operator(), file tests.cpp, line 966. 01:15:02 sech1: ^^ 01:25:20 selsta I've got macos 12.0.1 01:26:01 i installed 13.0 so that i can test if everything monero related works, and this randomx issue showed up 01:26:34 is there an env var to disable randomx jit? 01:26:41 yes\ 01:27:39 MONERO_RANDOMX_UMASK 01:27:40 MONERO_RANDOMX_UMASK=8 01:27:46 that's what i found 01:28:11 so jit is broken on macos 13? 01:28:15 yep 01:28:36 it's beta so maybe macOS itself is broken but so far everything else works 01:28:36 try doing make test in the randomx source tree? 01:28:50 that's where i got test 84 failing 01:29:38 i get a different hash every single time i run ./randomx-tests 01:30:02 that's not good.... 01:31:09 https://paste.debian.net/hidden/2e47bd45/ 01:31:32 2a sometimes passes but then it fails at 2b, other times it fails at 2a directly, it's inconsistent 01:32:57 try editing src/virtual_memory.cpp and make sure USE_PTHREAD_JIT_WP stays undefined 01:40:41 same issue 01:42:03 I wonder if CPU cache invalidation isn't happening. 01:50:00 might try explicitly calling sys_icache_invalidate() instead of __builtin__clear_cache 01:50:04 https://github.com/iains/gcc-darwin-arm64/commit/5889b350455d43a18ef7ef139a216eb87a1cdbb4 01:50:12 I feel like we've had this conversation before... 01:50:20 checking 01:52:10 then again, gcc should automatically be emitting that when __builtin__clear_cache is used 01:52:18 but still, worth a try 01:55:23 yeah I looked up this same question november 2020 01:55:35 nov 29 2020 01:57:13 xmrig calls it explicitly instead of the __builtin so it's worth a try anyway 02:09:33 didn't help 02:11:27 probably will need sech1 to get onto a Mac with OS13 to debug. sounds like an OS bug tho 02:19:47 I'll try to get it set up 02:20:56 i'll also check if xmrig has the same issue 02:21:12 does xmrig --bench=1M return the right hash? 02:21:21 you might need to also specify a --seed 02:22:30 "right hash" == same hash sum each time 02:23:17 fish: Job 1, './xmrig --bench=1M --seed "test"' terminated by signal SIGSEGV (Address boundary error) 02:23:37 got enough RAM? 02:23:56 32GB 02:24:18 no idea what tripped there 02:29:13 hm, bench defaults to seed of all zero, so try without --seed 02:29:30 same 02:29:47 I'm starting to wonder if I should just install the old OS again 02:29:48 and never had this problem on os12? 02:29:50 no 02:29:56 sounds like it 02:30:06 monero wallet sync also crashes 02:30:32 use after free or address boundary error 02:31:17 can't believe a use after free wouldn't be caught on every other os too 02:31:39 monero-wallet-cli(29852,0x16eaaf000) malloc: Incorrect checksum for freed object 0x137464610: probably modified after being freed. 02:31:56 Corrupt value: 0x0 02:32:01 ... or a heap overrun 02:32:28 could run wallet sync on x86-64 with valgrind 02:32:50 if it always dies in same place, can compare stack traces 02:34:21 doesn't always die in the same place 02:34:44 it's... super weird 02:35:16 but par for the course... along with the random connection drops and other crap 02:37:51 hard to imagine how they can screw up a BSD based OS so badly 02:38:36 the weird thing is apart from monero and randomx i didn't find any issues yet, web browsers also work fine (i assume they use jit) 02:39:40 hm, probably, yeah 02:40:42 trying to use the binaries generated with depends now 02:40:47 in case it's some compiler error, no idea 02:41:31 hm, the depends build will use the OSX 11.0 SDK and its compiler 02:41:53 ok same 02:41:55 and that binary works on OS 12 02:42:06 so I don't see a compiler bug being likely 02:45:50 so i'll rollback my laptop and try to install macOS 13 on that macmini we have 05:36:17 selsta it really looks like cache invalidation problem. Could you run xmrig under gdb and get a callstack? 08:23:31 i don't think gdb really works on macos. prob need to use lldb 09:14:36 "we don't use lmdb for the wallet..." <- only ringdb uses lmdb 15:48:55 Is it normal for zmq-pub to miss publishing mem-pool tx ? 15:54:43 That's unusual 15:54:56 The receiving script: https://github.com/trasherdk/node-test-snippets/blob/master/monero/src/node-zmq-grampy.js 15:57:43 It shouldn't miss publishing, but keep in mind that it doesn't dump everything in mempool when you first connect. It only publishes new transactions 15:59:13 On stagenet, sender show txid:6226f5d745bfaa158a70ba7b3e0d8b41e71e40b3cafb9449a0e839a45d3981eb 16:01:37 receiver show both mem-pool: 2022-07-07 15:03:42 0.500000000000 6226f5d745bfaa158a70ba7b3e0d8b41e71e40b3cafb9449a0e839a45d3981eb 0000000000000000 0.000000000000 57Es8x:0.500000000000 0 16:02:03 and receiving: Height 1130735, txid <6226f5d745bfaa158a70ba7b3e0d8b41e71e40b3cafb9449a0e839a45d3981eb>, 0.500000000000, idx 0/0 16:02:42 But nothing received from zmq-pub 16:05:29 did you submit this tx to your node? zmq-pub might skip publishing transactions which are in dandelion stem phase 16:06:33 Yes. My node and both wallets are on the same host. 16:07:32 it definitely filters something: https://github.com/monero-project/monero/blob/master/src/cryptonote_core/cryptonote_core.cpp#L1107 16:07:45 The Node script is on a different host. 16:07:47 relay_category::legacy, not sure which transactions it is 16:08:49 yes, it matches only for dandelion fluff transactions 16:09:05 so if the node adds transaction to mempool during stem phase, it's not sent vid zmq-pub 16:09:09 *via 16:09:51 which makes sense because this is an interface for miners (dandelion transactions should be mined only after stem phase has finished) 16:10:14 but it should still publish them after they switch to fluff 16:10:30 but it doesn't and it can be considered a bug 16:11:13 Nah. It never arrived on the sup end. How can I ensure I get all tx's ? 16:11:50 you can remove "matches_category(tx_relay, relay_category::legacy)" from that line if you don't mind rebuilding monerod 16:12:39 sech1: run xmrig benchmark under lldb? 16:12:46 The wallet confirm prompt was probably more than a minute after transfer command 16:13:29 a minute sounds like Dandelion++ delay 16:14:01 selsta if you can, yes 16:15:09 TrasherDK[m]: did you disable dns? 16:16:34 disable-dns-checkpoints=1 16:16:34 enable-dns-blocklist=1 16:17:06 I meant wallet --no-dns 16:17:36 Probably not. Checking... 16:17:45 that was one thing that caused slow tx generation but not sure if it's related to what you are writong 16:17:48 writing 16:37:10 Okay, dns disabled and sessions restarted. Let's see how that goes. 16:56:17 This one never came on zmq-sup: 8fa62730c2c82cc45703471b085c46e771ebc244c98287497857b4bf82685f75 16:56:56 No show: cbd114c486f60f8ad7532a9e0327c9a86a16217799d6e5bb4c9d002131f25d00 16:57:29 No show: 6226f5d745bfaa158a70ba7b3e0d8b41e71e40b3cafb9449a0e839a45d3981eb 17:07:24 sech1: https://paste.debian.net/hidden/5f75873c/ 17:08:53 does not seem useful 17:10:16 quite useful actually 17:10:33 or not 17:10:41 one thread is in hashAndFillAes1Rx4 but it's not the thread that crashed 17:11:08 it's probably the same cache flushing problem 17:11:14 code cache flushing 17:14:08 it definitely crashed in JIT generated code, in one of RandomX FP instructions 17:52:08 `libcryptonote_basic.a(cryptonote_format_utils.cpp.o): in function cryptonote::get_pruned_transaction_hash: undefined reference to cryptonote::get_transaction_prefix_hash(cryptonote::transaction_prefix const&, crypto::hash&)` 17:52:17 How did I get an undefined reference in a lib to something defined in the same lib? 17:52:57 I just need bp_prove and hash_to_curve and I've now linked ~11 libs I don't want just trying to get this to compile. I think this is my last error 17:53:21 it can happen if that function is inlined by compiler 17:53:37 ... so how do I get it to *not* be inlined? 17:53:44 I just ran make. Didn't do any CMAKE config 17:54:01 Though I am also looking for a CMAKE to disable all optional depends. I have hidapi rn and I do not want it 17:54:18 hmm, actually there's no such function 17:54:27 "cryptonote::transaction_prefix const&, crypto::hash&" with this specific signature 17:54:46 there is one with "(const transaction_prefix& tx, crypto::hash& h, hw::device &hwdev)" signature 17:55:19 ah, it's in cryptonote_format_utils_basic.cpp 17:55:36 Yep. Which is still in cn_basic AFAICT 17:55:50 I also have undefined reference *from* device 17:56:22 it's in cryptonote_format_utils_basic library 17:57:00 ... ah 17:57:22 see src/cryptonote_basic/CMakeLists.txt 18:06:23 Thanks :) 18:25:29 I can repro TrasherDK 's issue when my daemon is in the fluff epoch, and when stem isn't working. Trasher do you see this error in your daemon: `Unable to send transaction(s) via Dandelion++ stem`? 18:30:42 In fluff: my tx passes through my node with `tx_relay::local` so it doesn't get pushed out over zmq, then `on_transactions_relayed` in `dandelionpp_notify` will default upgrade it to "fluff" without pushing it out over zmq 18:30:57 When stem is working: my tx passes through my node with `tx_relay::stem`, then once my node sees it in the network from another node, the tx gets re-added to my node's tx pool and pushed over zmq because `already_have` in `handle_incoming_txs` is false 23:46:38 jberman: I have only 2 different log messages: 23:46:38 I background mining is enabled, 23:46:39 I Found block <6a9e00293490fa5d 23:50:24 every time you've ever tried submitting a tx through your own node, you don't see the tx in zmq? It's not a sporadic thing? 23:51:47 It's like 2-3 transfers is OK, 1 is not, then 2-3 OK and so on. 23:52:45 I'm at default log_level, maybe a higher would help? 23:55:00 can you try compiling with `const bool fluffing = false;` and seeing if they come through 100% of the time? https://github.com/monero-project/monero/blob/8f48f464957c875af3183cd9d35769592e1aef48/src/cryptonote_protocol/levin_notify.cpp#L701 23:56:08 `const bool fluffing = false;` the txs should always show up in zmq.. `const bool fluffing = true;` the txs should never show up 23:57:25 I'll give it a try.