-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Storage Execution
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Today, dbHash uses an _id scan to iterate through the collection. This is because dbHash uses an order-dependent hash, and since recordIds may differ across nodes, we use _id to maintain the correct ordering of documents.
However, this carries performance impact due to fetching documents with random I/O through the index scan. To improve this, we could consider adding a mode to dbHash that uses a natural order scan with a hashing approach that is not order-dependent.
If we do this, this must be an option that would be switched on/off since we must preserve the order dependent hash for older versions. For ex. if a customer wanted to do a data migration to a newer mongo version and verify data consistency between it and an older version that we do not backport this ticket to, they would want to use the existing dbHash method with _id scan
.