Details
-
Task
-
Resolution: Fixed
-
Major - P3
-
None
-
None
-
None
-
Fully Compatible
-
Sharding NYC 2023-04-17
-
25
Description
Currently, the step for calculating the read and write distribution metrics in the analyzeShardKey command works as follows:
- The shard running the analyzeShardKey command generates the split points for the shard key being analyzed, and then persists them in a temporary config collection named config.analyzeShardKey.splitPoints.<collUuid>.<tempCollUuid>.
- The shard sends an aggregate command with the $_analyzeShardKeyReadWriteDistribution stage to all shards that own chunks for the collection. The stage has the following spec:
{key: <object>,splitPointsNss: <namespace>splitPointsAfterClusterTime: <timestamp>,splitPointsShardId: <shardId> // Only set when running on a sharded cluster} - The shards running the $_analyzeShardKeyReadWriteDistribution stage then fetch all split point documents in the collection 'splitPointsNss' and create a synthetic routing table and do metrics calculation based on that.
- The shard from step 1 combines the metrics and drop the split point collection.
Making each command have its own split point collection is clean in a way but it has some notable downsides:
- If an interrupt (e.g. due to shutdown) occurs in step 4, the temporary collection would never get dropped. This can be seen in BFG-1859782, BFG-1859158, BFG-1858992 and more.
- It is expensive to keep creating and dropping collections. The command currently also need to drop the collection before it can return.
One can try to set up a periodic job to drop collections with this "config.analyzeShardKey.splitPoints.*" prefix. However, it is hard to differentiate between dangling collections and collections that are actually being used by some in-progress analyzeShardKey command.
Given this, the analyzeShardKey command should instead only use one config collection for storing split points, and rely on a TTL index to automatically clean up documents for a command that has already returned. That is, the stage should have the following spec, where 'splitPointsFilter' is the filter that only match the split point documents generated by a particular command.
{
|
key: <object>,
|
splitPointsFilter: <object>
|
splitPointsAfterClusterTime: <timestamp>,
|
splitPointsShardId: <shardId>
|
}
|
The users are likely to run a lot of analyzeShardKey commands back to back so there shouldn't be a lot of documents to filter out during the read.
Attachments
Issue Links
- split from
-
SERVER-75532 Investigate the high variability of the runtime of analyze_shard_key.js in suites with chunk migration and/or stepdown/kill/terminate
-
- Closed
-