Consider improving dbHash performance by using natural order scan instead of _id scan

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Storage Execution
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Today, dbHash uses an _id scan to iterate through the collection. This is because dbHash uses an order-dependent hash, and since recordIds may differ across nodes, we use _id to maintain the correct ordering of documents.

      However, this carries performance impact due to fetching documents with random I/O through the index scan. To improve this, we could consider adding a mode to dbHash that uses a natural order scan with a hashing approach that is not order-dependent.

      If we do this, this must be an option that would be switched on/off since we must preserve the order dependent hash for older versions. For ex. if a customer wanted to do a data migration to a newer mongo version and verify data consistency between it and an older version that we do not backport this ticket to, they would want to use the existing dbHash method with _id scan

      .

            Assignee:
            Unassigned
            Reporter:
            Xuerui Fa
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: