Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 7.0.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Sprint:
Sharding NYC 2023-04-17
Linked BF Score:
25
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Currently, the step for calculating the read and write distribution metrics in the analyzeShardKey command works as follows:

The shard running the analyzeShardKey command generates the split points for the shard key being analyzed, and then persists them in a temporary config collection named config.analyzeShardKey.splitPoints.<collUuid>.<tempCollUuid>.

The shard sends an aggregate command with the $_analyzeShardKeyReadWriteDistribution stage to all shards that own chunks for the collection. The stage has the following spec:

{
     key: <object>,
     splitPointsNss: <namespace>
     splitPointsAfterClusterTime: <timestamp>,
     splitPointsShardId: <shardId> // Only set when running on a sharded cluster
}

The shards running the $_analyzeShardKeyReadWriteDistribution stage then fetch all split point documents in the collection 'splitPointsNss' and create a synthetic routing table and do metrics calculation based on that.
The shard from step 1 combines the metrics and drop the split point collection.

Making each command have its own split point collection is clean in a way but it has some notable downsides:

If an interrupt (e.g. due to shutdown) occurs in step 4, the temporary collection would never get dropped. This can be seen in BFG-1859782, BFG-1859158, BFG-1858992 and more.
It is expensive to keep creating and dropping collections. The command currently also need to drop the collection before it can return.

One can try to set up a periodic job to drop collections with this "config.analyzeShardKey.splitPoints.*" prefix. However, it is hard to differentiate between dangling collections and collections that are actually being used by some in-progress analyzeShardKey command.

Given this, the analyzeShardKey command should instead only use one config collection for storing split points, and rely on a TTL index to automatically clean up documents for a command that has already returned. That is, the stage should have the following spec, where 'splitPointsFilter' is the filter that only match the split point documents generated by a particular command.

{
     key: <object>,
     splitPointsFilter: <object>
     splitPointsAfterClusterTime: <timestamp>,
     splitPointsShardId: <shardId>
}

The users are likely to run a lot of analyzeShardKey commands back to back so there shouldn't be a lot of documents to filter out during the read.

split from

SERVER-75532 Investigate the high variability of the runtime of analyze_shard_key.js in suites with chunk migration and/or stepdown/kill/terminate

Closed

Assignee:: Cheahuychou Mao
Reporter:: Cheahuychou Mao
Participants:: Adi Zaimi, Cheahuychou Mao, Githook User
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Apr 03 2023 02:52:21 PM UTC
Updated:: Oct 29 2023 09:23:46 PM UTC
Resolved:: Apr 07 2023 10:14:58 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates