[SERVER-74681] Allow dbhash to accept an _id range Created: 07/Mar/23  Updated: 24/Jul/23  Resolved: 24/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Judah Schvimer Assignee: Backlog - Replication Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
Related
related to SERVER-74718 dbcheck doesn't check MinKey Open
Assigned Teams:
Replication
Backport Requested:
v6.3, v6.2, v6.0, v5.0, v4.4
Participants:
Linked BF Score: 148

 Description   

Currently dbhash hashes an entire database. This lets us know if there's a divergence between nodes but not where the divergence is. We should allow it to accept an _id range so we can more quickly pinpoint the divergences. This also allows us to run dbhash faster by hashing multiple ranges in parallel.



 Comments   
Comment by Judah Schvimer [ 24/Jul/23 ]

We're investing in dbcheck over dbhash, so closing as Won't Do.

Comment by Didier Nadeau [ 27/Mar/23 ]

The commit was reverted due to   BF-28078. This wasn't deemed urgent to recommit by daniel.gottlieb@mongodb.com so I'm sending it to our backlog.

Comment by Didier Nadeau [ 27/Mar/23 ]

judah.schvimer@mongodb.com to check priority of this ticket.

Comment by Githook User [ 21/Mar/23 ]

Author:

{'name': 'Sviatlana Zuiko', 'email': 'sviatlana.zuiko@mongodb.com', 'username': 'szuiko'}

Message: Revert "SERVER-74681 Allow dbhash to accept an _id range"

This reverts commit 9662f3c9edfd52ce4c9b0cad619bed57562a3daa.
Branch: master
https://github.com/mongodb/mongo/commit/40c9e32b147278115b62c10727d3defb9b22fa81

Comment by Githook User [ 16/Mar/23 ]

Author:

{'name': 'Sophia Tan', 'email': 'sophia_tll@hotmail.com', 'username': 'sophiatll'}

Message: SERVER-74681 Allow dbhash to accept an _id range
Branch: master
https://github.com/mongodb/mongo/commit/9662f3c9edfd52ce4c9b0cad619bed57562a3daa

Comment by Daniel Gottlieb (Inactive) [ 07/Mar/23 ]

It's more idiomatic for ranges in these "batch everything starting from the beginning" to be expressed as (minKey, maxKey] (minkey exclusive, maxkey inclusive). This is what dbcheck does.

And omitting a minKey would be "start at the very beginning". This schema is a pattern for chunking up a big thing to ensure continuity between a prior command and the next command.

I could* be convinced the dbCheck interpretation isn't exactly right for dbHash (exercise left to the reader), but I'd rather be consistent above all else here.

Comment by Judah Schvimer [ 07/Mar/23 ]

Currently dbhash accepts

db.runCommand(
   {
     dbHash: 1,
     collections: [ <collection1>, ... ]
   }
)

Now it should accept:

db.runCommand(
   {
     dbHash: 1,
     collections: [ <collection1>, ... ],
     minKey: <minimum _id>,
     maxKey: <maximum _id>
   }
)

where minKey and maxKey are only accepted if collections specifies one collection.

minKey should definitely be inclusive. daniel.gottlieb@mongodb.com, should maxKey be inclusive or exclusive? I'm trying to mirror it off of dbcheck and it looks inclusive, though I'd think exclusive makes more sense.

Generated at Thu Feb 08 06:28:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.