Resharding cloner can't store progress until it gets response from the query it sent

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Cluster Scalability
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The resharding cloner current relies on the resumeToken returned from the getMore response to track the progress of the documents it needs to fetch. This is problematic in cases where the getMore execution runs too long and frequently gets interrupted resulting in the cloner to start over again from the last 'checkpoint'. One edge case scenario this can occur is if the documents it owns are at the tail end of the btree, this can easily occur in scenario where the only chunk it owns just recently migrated to the donor shard, causing those set of documents to have very recordIds. The cloner uses $natural order, so it will need to scan through almost the entire collection before it can see documents that belong to the recipient (docs that the cloner query would return).

            Assignee:
            Unassigned
            Reporter:
            Randolph Tan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: