[SERVER-24173] dbhash should check and/or return indexes in its reply Created: 16/May/16  Updated: 06/Dec/22  Resolved: 10/Jun/19

Status: Closed
Project: Core Server
Component/s: Diagnostics, Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Daniel Coupal Assignee: Backlog - Storage Execution Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-24205 Replicated dbhash command Backlog
Assigned Teams:
Storage Execution
Participants:

 Description   

dbHash should, in addition to checking collection data, also check the indexes. Even if the indexes aren't taken into consideration when performing the hash, we should still return the indexes in the return document so that the user can verify that each member of a replica set has the same indexes.

(Of course, it's possible for replica set members to have different indexes; e.g. when performing a rolling index upgrade. But in the "normal" use case, where the user doesn't circumvent the replication system, it's expected that all the indexes will be the same.)

Original Description

dbhash provides a way to verify that replicas have the same data.
How about having a similar check for the indexes.
Walk an index, calculating a hash value on the index keys.

An overall value for all indexes of a given collection would be a quick check to see that all indexes are identically built, and that the collection does not have a different number of indexes for a given collection.

In rare situations, some users want a different number of indexes, so for that purpose and for the purpose of identifying which index differ we may need to show:

  • hash per index
  • hash for all indexes on a collection
  • hash for all indexes on a database


 Comments   
Comment by Geert Bosch [ 14/Jun/19 ]

This last comment turns out to be incorrect, my apologies. I'm hesitant to add (much) more information to the output of this command, as, unlike listIndexes and listCollections, it does not use a cursor for its output. While this may already be an issue when not specifying the set of collections to obtain the hash from, it would become much more of an issue if we add index information. So, it is probably best to use the regular listIndexes or listCollections in addition to the dbHash command.

As dbHash is primarily intended for testing purposes and not for use on running production systems, we don't intend to implement the requested enhancements.

Comment by Eric Milkie [ 10/Jun/19 ]

The dbhash calculation includes the index specifications for each collection. Therefore, if nodes have different indexes, this would be reflected in differing dbhash values.
Consistency of index data can be verified using the validate command.

Comment by Kyle Suarez [ 23/May/16 ]

daniel.coupal, I believe that robert.guo's work on the validate command mostly accomplishes what you were initially suggesting, especially in terms of identifying whether or not you're affected by SERVER-22970. As such, the Integration team is going to close this ticket as resolved.

However, if you still think we should have a command that makes it easier to run dbhash, then I think that work is better tracked in your other request, SERVER-24205.

Generated at Thu Feb 08 04:05:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.