-
tevador
-
sech1
cache generation time on HDD makes it no go
-
tevador
it's too slow even on an SSD
-
sech1
I had another idea of pulling random data from blockchain (some random block number + corresponding tx_pruned from that block) at the beginning of each RandomX hash, and attaching this data to the final Blake hash. This way CPU and I/O can go in parallel, and HDD should be fast enough to be done in 10-15 ms
-
sech1
*random block number -> data from block and tx_pruned tables
-
tevador
I tried this, it's too slow. It's better to select a random block and a some random txs separately.
-
sech1
miners will have to run fast NVMe drivers or straight up loads of RAM
-
sech1
whatever, just select some random chunk of data from either table
-
sech1
so it could be done in one I/O operation
-
merope
wouldn't that penalize miners on older hardware though, thus giving the upper hand to large farms/servers/high-end hardware
-
tevador
miners would have to run a RAM disk or some other db structure than LMDB
-
sech1
miners mine in fast mode, so they have 1-2ms to read the data on each thread
-
tevador
or LMDB with lots of RAM so that it never has to hit the disk
-
sech1
Actually, typical gaming PC is 10-20 kh/s, so any NVMe capable of 10-20k IOPS will work in theory
-
tevador
selecting 1 block blob means several random disk accesses with LMDB
-
sech1
with proper buffering of completed hashes
-
sech1
or rather queueing of completed hashes until their db data is available
-
merope
Wait, what about pool miners? Or even people running multiple miners to a single node/p2pool?
-
tevador
they would need their own copies of the blockchain
-
merope
Wouldn't the network traffic/latency kill the performance?
-
merope
Right, but if you have to pull random data for each hash, then you'll have to wait for the node to respond
-
merope
So you're adding internet latency in there, no?
-
sech1
even "slow" NVMe drives can do way above 100k IOPS in 100% read mode:
techpowerup.com/review/kingston-nv1-1-tb/4.html
-
sech1
100% read mode is what will happen during mining, right?
-
tevador
there will be some writes as new blocks are added to the db
-
sech1
99.9% read then :D
-
tevador
probably
-
sech1
read in the beginning of each hash -> calculate hash normally -> queue the final Blake data until the db read is finished
-
sech1
if I/O is done asynchronously, it shouldn't even affect the hashrate that much
-
tevador
the problem is that miners could skip hashes that hit blockchain data they don't have
-
sech1
of course db in RAM will be much faster anyway
-
sech1
then we can do db read in the end
-
sech1
more queueing, still the same hashrate
-
tevador
you'd want to force at least 1 RandomX program before selecting the block, maybe more
-
sech1
what's the size of `blocks` and `tx_pruned` tables now?
-
tevador
around 10 GB together, give or take
-
sech1
ah, so should be easy to just load them to RAM in xmrig
-
sech1
no need for fancy I/O
-
tevador
LMDB basically does this, but it works even if you don't have enough RAM
-
sech1
but fragmentation
-
tevador
miners could easily have a dedicated db with just the two tables
-
sech1
probably not even a db, just two files growing
-
sech1
because new blocks usually are added to the end
-
tevador
you'd need to store at least the offsets where each blob starts, some primitive database
-
sech1
but then, pools would just need to send new pruned blobs to miners, less than 300 Kb per block, right?
-
sech1
plus the initial download
-
sech1
but it's still a lot of bandwidth for pools
-
sech1
you proposal required 256 Mb every 64 blocks, so 4 Mb per block which is more than with this approach
-
sech1
actually, even with the original proposal, pools could just send updates every 2 minutes (pruned blobs)
-
sech1
*every block
-
sech1
initial download can be done via torrent that updates every day or week
-
tevador
yes, but miners would need to keep the 10 GB database
-
tevador
that's a botnet killer
-
sech1
yeah, but the point was to cripple pools
-
sech1
the whole idea is that each miner must have blockchain data. It turns out that it's only 10 GB and pools can send updates every block which is not as much bandwidth as initially thought
-
tevador
I wonder who would provide the initial download to new miners
-
sech1
initial download via torrent
-
sech1
pool can just seed from their servers
-
tevador
with a pruned node, you can just run it and it will sync by itself
-
sech1
or pruned node, yes
-
sech1
so pool doesn't even need to provide it
-
tevador
if every miner ran a node, that would be a big win
-
merope
<sech1> "pool can just seed from their..." <- wouldn't that just open up the pool to an easy ddos?
-
sech1
miners will eventually end up with "leech" node that just gets data from real nodes and updates these two tables
-
sech1
less resources, less disc space used
-
merope
And not just the pool, but also every miner who contributes would end up getting caught in it (assuming that their upload bandwidth is much smaller than a typical server)
-
sech1
so something like xmrig doing the hashing and "monero-db-miner-sync" updating the files in realtime
-
sech1
most likely setup in the end will be 1 real node per miner (they'll run own node for reliability) + all PCs in their network will run "leech" nodes to sync
-
sech1
so pools will still be possible, but miners will have to run nodes
-
hyc
your test was with a cold disk cache, so it's not a fair comparison to a RAMdisk setup
-
hyc
keep in mind that a RAMdisk setup requires application to manage shuttling cold/hot data in and out of RAMdisk
-
sech1
I'm not even sure it would kill botnets
-
tevador
who runs their node from a RAM disk?
-
hyc
whereas LMDB just needs to access blocks and let FS cache handle things
-
sech1
the amount of data being sent every 2 minutes is not that big, botnet can handle it with clever data distribution
-
hyc
my point is, LMDB will always be more efficient than any other solution
-
tevador
except when most of the database file is dead weight
-
hyc
on an active node you can assume most of the interior of the Btree is cached in RAM
-
hyc
therefore most data seeks will only incur 1 IOP
-
tevador
I measured an average "read amplification" factor of 8 with a standard pruned db file
-
hyc
that sounds like Linux default readahead
-
tevador
e.g. 2 GB actually read from the disk to read 256 MB of blob data
-
hyc
it always reads 64K for any 4K access
-
tevador
perhaps, but with a defragmented db, the amplification factor went down to ~2
-
tevador
so I'm assuming you read a page from the disk and most of the data in it is from other tables
-
hyc
except that we don't interleave data from separate tables onto a single page
-
tevador
logical page maybe, but the actual hardware page may be larger
-
hyc
and txn blobs will always be much smaller than 4K anyway
-
hyc
large-sector devices still only use 4KB per sector instead of 512B
-
tevador
SSDs nowadays use 8 or 16 KB pages
-
hyc
that's a pretty strange result
-
hyc
I don't think the OS cares, it will access VM-page sized blocks
-
tevador
SSDs cannot read a smaller unit than a page
-
tevador
and they typically cannot erase a smaller unit than 128 pages
-
hyc
that's not true. they can't *erase* a smaller unit than a page
-
tevador
they erase blocks, which are groups of pages
-
hyc
will have to hunt down a data sheet later
-
hyc
anyway, the read amplification you measured only makes sense if you used a cold cache, so internal btree pages needed to be read
-
tevador
the whole measurement is for about 500 000 reads (average blob size ~500 bytes)
-
hyc
and then used a warm cache for the "defragmented" DB
-
tevador
no, the defragmented db was also cold
-
hyc
if you know your SSD page size is 16KB you might try rerunning a test with LMDB pagesize set to 16K
-
tevador
I guess that would require rebuilding the db file
-
hyc
yes. mdb_dump / mdb_load will do it
-
hyc
of course, that will also defragment the data
-
hyc
it's true that the data is interleaved since it's written on the fly as blocks or txs arrive. but there is no interleaving within pages. so the fragmentation should only result in random seeks, nothing more
-
hyc
it shouldn't result in excess reads per request
-
tevador
So my OS page size is 4K, but the SSD page size is 16K. That explains the read amplification.
-
hyc
that's kinda bad. does x86 even support 16K VM pagesize?
-
hyc
HDDs will still be fine with 4K. I think even SMR drives are still 4K
-
hyc
what model SSD was that?