Consider Adding Ability to Throttle Resharding Cloner

XMLWordPrintableJSON

    • Cluster Scalability
    • 2
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Based on HELP-84458, it appears that the resharding cloner can easily produce more load than replication can handle in certain hardware configurations. Today, the resharding cloner does writes locally and does not wait for majority before proceeding to the next write. After cloning all documents, the first time the recipient will wait for replication before proceeding is prior to building indexes (after SERVER-103566).

      This is good for overall throughput, but maximizes the strain resharding puts on replication and the rest of the system. We can already throttle resharding to some extent using parameters like reshardingCollectionClonerBatchSizeCount and reshardingCollectionClonerWriteThreadCount, but these are not directly aware of the current level of replication lag. Until SPM-2935 and SPM-4263 can address this in a more complete way, we may want to consider adding an option to force the resharding cloner to await replication periodically as a stop gap solution.

            Assignee:
            Unassigned
            Reporter:
            Brett Nawrocki
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: