-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Storage Engines
-
5
-
나비 (nabi) - 2024-04-16, Nick - 2024-04-30, Megabat - 2024-05-14
The definition of WT_COUNTER_SLOTS is currently 23, we should explore some combination of:
- increasing this number
- using a different value for connection stats vs data source stats
- making these value(s) configurable via wiredtiger_open and/or reconfigure
- dynamically configuring the value(s).
The current definition and a large comment are in stat.h, partially quoted here:
* For now, we use a fixed number of slots. Ideally, we would approximate the largest number of * cores we expect on any machine where WiredTiger might be run, however, we don't want to waste * that much memory on smaller machines. As of 2015, machines with more than 24 CPUs are relatively * rare. * * Default hash table size; use a prime number of buckets rather than assuming a good hash * (Reference Sedgewick, Algorithms in C, "Hash Functions"). */ #define WT_COUNTER_SLOTS 23
Current CPUs have cores in the hundreds, of course it will be more as time goes on.
Background: In a meeting with daotang.yang@mongodb.com and louis.williams@mongodb.com on SERVER-85527 we were exploring why some new changes related to binding precompiled cursors did better in a single threaded readonly test, but got progressively worse in as more cores were added. The change in question adds another WT API call to every operation, but in the single threaded case, that overhead is more than made up for in not having to do a internal string compilation at the beginning of the begin_transaction call. When scaling, the effect of adding the API call becomes more dominant. Our hypothesis is that contention is increased due to more statistics are incremented by WT over the course of a single operation.
I think the contention we see in this case is in connection statistics, and I expect that contention is already happening, before this server change. It would be straightforward to Increase the connection statistics to a larger number (pick a prime in the hundreds) independently of the data source number. We could compare measurements on YCSB 100read with a large number of cores.
Increasing the slots for the connection by 10 may have the effect of increasing the connection stats footprint from ~100K bytes to ~1M bytes. Whereas increasing the slots for data sources can have a huge impact, depending on the number of dhandles in the system. A modest 1000 dhandles with a current footprint of 20M would go to 200M. And there are installations with 100K dhandles.
Another effect of increasing the number of slots is there is more work when the stats are aggregated, but I'm not aware that aggregation has any significant cost currently.
To be specific, I'm proposing as a first step:
#define WT_STAT_CONN_COUNTER_SLOTS 197 #define WT_STAT_DSRC_COUNTER_SLOTS 23
using those constants as appropriate and measuring. 197 is not magic by any means, but it is prime. And 23, well we might even consider a lower number, since access to these counters is already spread across the number of active dhandles.
Even better would be to make these configurable and dynamic. If we see > 500 active sessions in a system, we bump up the CONN number, for example. If we see a huge number of active dhandles, perhaps we bump the DSRC number down.