-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Tools and Replicator
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Context:
To ensure optimal index building performance, mongosync should programmatically set the --createIndexesDegreeOfParallelism option when the user does not provide it. The value should be determined based on the Atlas default mappings, which are dependent on the number of vCPUs on the destination cluster's host. However, mongosync cannot set the parallelism degree for migrations to sharded clusters because the numCores on each shard is not accessible via the mongos.
Problem:
While mongosync can obtain the necessary host information (vCPUs) from replica sets by running the hostInfo command on the mongod instances, this approach fails for sharded destination clusters. Running hostInfo on the mongos is not sufficient to gather the required details for each shard, since it only returns information for the mongos host node. And it is not guaranteed that mongosync can reach the mongod node(s) for each shard in a sharded cluster.
Solution
For mongosync to be able to optimize its index builds, we'd need to be able to access the numCores of the destination clusters' mongod nodes host machines via the mongos.
Other notes:
- Also, since shards can be on different tiers to optimize for specific workloads via independentShardScaling, returning the lowest num vCPUs on any of the shards would suffice.