17:01:56 Meeting in 1 hour 17:04:25 hurray 18:00:04 Meeting time. Hello! https://github.com/monero-project/meta/issues/1294 18:01:05 hey 18:01:37 *waves* 18:02:37 Ok, let's hear the reports from last week 18:03:51 still TODOs and bug fixes, fixed an issue where the wallet would crash with SIGABORT, which took a while because it was hidden behind macros `SCOPED_WALLET_UNLOCK*` 18:03:51 finally implemented methods for `--restore-multisig-wallet` (not tested yet) and `--generate-from-json` (simple tests succeeded) 18:03:53 AFAICT the JSON file format is mostly undocumented (found it's mentioned here https://monero.stackexchange.com/a/11045 ) 18:03:55 Tried to explain all the fields in a comment, as far as I figured them out from the source code (in `wallet2::generate_from_json()`). 18:04:26 Submitted PR's to address OOM's from batch verifying FCMP++'s, reviewed ofrnxmr and 0xfffc 's PR's to also address OOM's during sync (and dynamic span PR), fixed a deadlock in the daemon. Continuing by addressing vtnerd 's comment on deadlock PR, continuing investigation on one of my OOM PR's that modifies a glibc setting in response to jeffro256 and @ComputeryPony's comments, then continuing PR review toward tx re-relay (started on that last week as well) 18:04:51 Howdy 18:05:20 Sounds like a very busy week, jberman :) 18:05:43 definitely was a long week 18:06:01 jberman the binary I'm running since last Friday still without OOM issues 18:06:25 Working on performance issues around popping blocks and mempool handling. Trying to determine the cause of some consistent consensus warnings that have cropped up. Working on extending Carrot device support in the integration. Reviewing @j-berman's proposed stressnet changes 18:06:40 Are those OOM problems something introduced recently, over the course of implementing FCMP++, or are they more or less a side-effect of the much larger tx sizes? 18:06:49 the stressnet also stress testing the devs 18:06:58 Indeed 18:07:44 I think multithreaded batch verifying introduced in v1.3 exacerbated the issue 18:09:06 But 128-input FCMP++ verifying currently takes ~1.2GB, which is higher than I originally anticipated / has been there since the change to bump to 128 inputs. kayabanerve dropped it to ~800mb with a nice change recently 18:10:12 Runaway spans during IBD causing OOM is upstream and is also exacerbated by much larger blocks 18:10:22 Oh, that's almost in the region of memory earlier Zcash versions needed, more or less disqualifying it for use on mobile devices back then ... 18:11:29 This is on the verifying side btw. I haven't looked closely at memory footprint on the prove side 18:13:13 Still crazy somehow that this should need such data that it amounts to 800 MB. I mean, that's what, 5 MB per input? 18:13:47 More like 6 18:14:49 This boomer remembers computers with less RAM in total than that :) 18:15:40 When I looked closer at some of the fn's, it's doing lots of allocations and holding them in place in ways that seem optimizable, but done for ease of implementation. When first implemented we were talking about capping at 8 inputs max, so makes sense it was implemented this way 18:16:09 Yeah, we are certainly at the very beginning of a long journey. 18:17:42 If that's it about the reports, some bit of info about the CCS PR we discussed last week: https://repo.getmonero.org/monero-project/ccs-proposals/-/merge_requests/621 18:18:09 It was, among many other PRs, subject of last Saturday's community meeting 18:18:35 I did not read the log closely, only skimmed it, but I think it was looked at favorably 18:18:39 My gut says that since we know the total size of a batch beforehand, we can do all the allocations at once which would be more performant on just about *any* allocator scheme, but particularly more performant in the case of glibc pertaining to arena usage. How easy would this be do refactoe the code to do? This wouldn't need a custom allocator impl 18:19:37 that's basically kayabanerve 's idea I linked, ya? 18:21:10 https://github.com/monero-oxide/monero-oxide/issues/136 18:22:24 But that's more a question of more speed, than a question of much lower RAM usage? 18:23:01 personally I think we have a solid simple short term solution here with setting max arenas to n threads, which is a commonly used pattern that is sane. I think it makes sense to move forward with it 18:23:05 Or do we run into problems with heap fragmentation? 18:23:41 What is an "arena"? One big chunk of memory to allocate into? 18:24:05 Oh yeah I didn't see this. Exactly this 18:25:04 IMO I think it can be okay as a short term solution for the alpha stressnet, but we absolutely shouldn't rely on it and should fix the underlying issue b4 beta stressnet 18:25:32 you can find some info here https://www.man7.org/linux/man-pages/man3/mallopt.3.html 18:25:48 under `M_ARENA_MAX` 18:26:05 More or less. The glibc allocator allocates a big chunk all at once expecting certain allocations to have a similar lifetime, then de-allocates the chunk all at once when it's no longer used 18:26:22 Personally I think the underlying issue is with glibc for even doing this (not releasing memory back to the OS even after freed), so it makes sense to tune the allocator not to do that weird behavior in this case 18:26:27 I see, thanks to both 18:27:41 Seems to me a bit optimistic to assume to be able to reliably free up all allocated pieces in such an arena up to the very last one so that the memory can go back to the OS ... 18:28:52 Or is the arena parameter somehow when allocating, so you have finegrained control how you use it? 18:29:20 The memory appears to be getting freed properly, the issue appears to be that glibc is holding on to already freed memory because it's keeping already allocated memory cached in these arenas 18:29:53 And there can be more arenas than threads 18:30:42 So SNeedlewoods's 2 thread machine is OOM'ing because glibc isn't releasing already freed memory back to the OS 18:31:42 Well, seems to me a big chunk can only go back if you free the very last bit that you allocated in there, no? Allocate 1000, free only 999, big memory chunk stays 18:32:09 changing the allocation behavior on the Rust side isn't even guaranteed to fix other allocator's weird behaviors we might not know about too... 18:32:44 hence why I think it makes sense not to let glibc's allocator do something that doesn't make a lot of sense in our case imo 18:33:44 yea, that's the behavior of arenas as I understand it as well 18:34:15 It's quite unfortunate to have to worry, and speculate, about the inner workings of allocators, IMHO, but maybe very hard to avoid in our case 18:34:31 Hmm, it might be time to dust off https://github.com/monero-project/monero/pull/9207 18:34:32 But in general, tons and tons of small allocations is the bane of almost every allocation scheme. I feel like it trying to limit the number of allocations will improve performance on basically all real-world systems 18:34:37 Statically linking glibc would at least guarantee consistent behavior across Linux distributions (even on musl libc based systems). 18:36:04 Does everything have to be fully dynamic? Can't we pack some of those pieces simply into vectors? Maybe dumb question, but who knows :) 18:36:10 I guess my main argument is not to block beta on this (nor would even argue mainnet should be blocked on this), since I think it's a reasonable step to tune glibc in this way 18:36:44 I'm not arguing that it's not worthwhile to pursue the change to refactor the lib's memory usage 18:37:29 Well, you would need to address the pieces somehow 18:37:30 Should all verification code be audited before deployment on mainnet? (And you could audit a code refactor after the hard fork, of course.) 18:38:07 of course 18:38:13 the verification code has been audited already too fwiw 18:39:54 So you would not want to use a different allocation method on beta stressnet if you would not plan to get it audited before mainnet. 18:40:52 Ok, the adventure of getting FCMP++ to fly continues in full stride. Some problems with high weight for takeoff, lol 18:42:22 Do we have any different subject still to discuss today? 18:43:02 I think it would depend on how much harder to would be to review the changes after the rework commit 18:43:43 I think kayabanerve would have the best idea of amount of work involved 18:43:49 Like just b/c we have an audit doesn't mean *all* of the code is frozen in time in perpetuity, but ideally we want to be able to confidently say that changes after the audit don't effect security 18:43:59 *affect 18:44:36 Could we agree that it should be a blocker for mainnet? 18:45:51 If we still have OOM issues from verifying FCMP++'s and/or notice perf regressions, yes absolutely :) 18:46:58 Maybe testing on some mobile devices has the potential of showing some problems that don't occur on bigger machines 18:47:17 Later, if everything is more mature I mean 18:48:59 Alright, let me close the meeting proper, while discussions may continue. Thanks for attending everybody, read you again next week! 18:49:34 thanks everybody 19:10:14 My hot take is a 1.2 GB verification cost is understandable and it's y'all's fault. 19:10:15 Thank you for coming to my TED talk. 19:11:33 Jokey being a dick aside, it isn't like this for 2-in or even 8-in. For 128 inputs, we hit approximately 200k elements in the Constant Reference String. 200k elements, when scaled by 32-byte scalars, is ~6 MB. 19:11:49 So the question becomes do we have 100 vectors of length 200k in the FCMP? 19:12:19 Yeah, probably. Each FCMP input creates 7 vector commitments and each of those contribute to a polynomial. 128 * 7 = ~1000. 19:12:56 Now, obviously our memory usage is less than 6 GB, so these aren't simultaneously held and some are sparse, yet the fundamental acknowledgement is this amount of memory is _understandable_ due to the amount of inputs. 19:13:50 Now, can we decrease the memory? Maybe. 19:13:51 Can we decrease the amount of allocations? Definitely. 19:13:53 The issue is that it's going through each allocation and deciding how to optimize it. That would be a couple weeks of work across the stack. 19:14:17 The 33% I saved was an obvious candidate just sitting there and an oversight. It's not that those can't exist. It's that I just saw the one. 19:14:54 And then after spending weeks editing multiple libraries, it still won't be the most efficient as possible. For that, we'd look at fundamentally rearchitechting, or even removing, intermediate representations. 19:15:19 We could also review an impl of the verifier which doesn't prioritize matching the paper yet prioritizes a minimal memory footprint. 19:15:33 Now that the lib is largely out of active development, we can do all of this. 19:16:17 It's just weeks of many changes to critical code, prior to mainnet, when this is understandable and I don't want to threaten timelines any further. The only reason to dick around is if the memory is such an issue it's mandated. 19:16:42 And these changes likely decrease auditability :/ That intermediate representation exists for a reason 19:17:57 So I'd rather do only what's necessary now, and kick improving further till after deployment. I believe 800 MB to verify the largest proof is fine. 19:18:16 Hitting the allocator in the shins was unexpected and I would like to blame the Linux kernel and glibc /s 19:18:48 That was unexpected and did need to be handled, which is why it's great jberman was able to tune it within recommended parameters to correct the effected issue. 19:20:29 I think either we keep the current allocator tuning, or we spend the two hundred lines of code to write a static allocator which only allocates once at boot and we just use a sufficiently aggressive locking system to declare it safe. 19:21:32 (1 GB per thread, allocations increment the position from 0 when batch verify starts, resets the position when it ends) 19:23:01 Or, we can follow the FCMP++ research lead's advice and reject 128-in transactions 19:23:24 I concur with that opinion, personally 19:24:23 But again, on a legitimate note, the performance cost is a cost of the functionality gained. Here, the functionality isn't FCMPs. It's the absurd input count. 19:25:49 I understand I've lost that argument, so now that we have the problem of functioning despite absurd params, it's our job to function. Tuning the allocator is the best method at this time imo 19:27:35 I'm personally much strongly in favor of the former for mainnet until/unless we receive an OOM report on a system that doesn't use glibc's allocator 19:27:57 Also, yes, my apologies for the oversight and not considering allocation frequency during implementation, solely amount of memory 19:30:19 And I do agree, it would be very nice to have wider testing on more devices for beta. What would be really cool is if there was a dedicated QA tester who has every device we want to support with minimum specs 19:30:21 Slightly on-topic. Mayhem was testing the dynspan & other and fixes, and with spans under 100mb he started to hit OOMs after hardfork. This is the 2gb system, so i assume this is due to the input verification 19:30:54 Other ibd* fixes 19:31:43 Also, the issues are reduced by a better impl but eliminated with properly defined allocation strategy (not trying to simultaneously verify 1024 inputs with just 4 GB of RAM, freeing memory which has been freed, etc) 19:32:05 I think 2gb is below target recommendations though. I got him to add prep-block-threads=1 (not sure if helpful) and if that fails will add 1-2gb of swap 19:32:20 That one is on topic and related. His machine has 1 vCPU, which I would think means it's single threaded and may not be using multiple arenas 19:32:50 log level 2 logs would tell us if he's OOM'ing on batch verification 19:33:00 of FCMP++ txs 19:37:48 if he OOM's again with log level 2 and it's from batch verification and it's a linux machine (I think it is), he can try starting monerod with this prefix `MALLOC_ARENA_MAX=1 ./monerod ` and seeing if that prevents the OOM. That is functionally the solution from 228 that we're discussing here