[SERVER-74793] dbCheck behaves differently on primaries and secondaries w.r.t extra _id index entries Created: 13/Mar/23  Updated: 16/Nov/23  Resolved: 21/Apr/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0, 7.0.0-rc2

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-73470 Log index inconsistencies detected by... Closed
Related
related to SERVER-82305 Have dbCheck ignore prepare conflicts... Closed
is related to SERVER-76232 Do not crash when index inconsistenci... Closed
Assigned Teams:
Storage Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.0, v6.3, v6.0, v5.0, v4.4
Sprint: Execution Team 2023-05-01
Participants:
Linked BF Score: 8

 Description   

When a collection is in an inconsistent state such that:

  • There exists an _id -> RecordId index entry
  • But the associated record store entry is missing

A primary processing a dbCheck command will write an "extra index key" error into the health log (complete with a backtrace):

{ "_id" : ObjectId("640f5a7a51d8ea8c8a8acdd8"), "namespace" : "test.bla", "timestamp" : ISODate("2023-03-13T17:16:41.046Z"), "severity" : "error", "msg" : "Erroneous index key found with reference to non-existent record id", "scope" : "index", "operation" : "Index scan", "data" : { "recordId" : "1", "indexKeyData" : [ { "key" : { "_id" : ObjectId("640f4932ad87ee6ac9de160c") }, "pattern" : { "_id" : 1 } } ], "backtrace" : [ { ... ] } } }

A primary will then abort the rest of the dbcheck (replicates a "dbCheckStop" oplog entry).

A secondary that's processing a dbcheck oplog entry will not notice the extra _id index entry. It will* log that its dbcheck failed in the event that the record store document should exist:

{ "_id" : ObjectId("640f5de1d688051409c37488"), "namespace" : "test.bla", "timestamp" : ISODate("2023-03-13T17:31:13.619Z"), "severity" : "error", "msg" : "dbCheck batch inconsistent", "scope" : "cluster", "operation" : "dbCheckBatch", "data" : { "success" : true, "count" : NumberLong(0), "bytes" : NumberLong(0), "md5" : { "expected" : "ca673557f7697edb1dee246a460173b3", "found" : "d41d8cd98f00b204e9800998ecf8427e" }, "minKey" : { "$minKey" : 1 }, "maxKey" : { "$maxKey" : 1 }, "readTimestamp" : Timestamp(1678728673, 1), "optime" : { "ts" : Timestamp(1678728673, 2), "t" : NumberLong(4) } } }

It would be better if secondaries also logged an extra _id index entry error so we could distinguish between index inconsistency (a storage problem) and data inconsistency (a replication problem).



 Comments   
Comment by Githook User [ 01/Nov/23 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: Revert "SERVER-74793 dbCheck should not fail when detecting corruption"

This reverts commit 3170c79ed4aea301c170b10c09cec85336a20777.
Branch: v6.0
https://github.com/mongodb/mongo/commit/5549efee609d3e96f4e485ad7f0d3c7161b2a135

Comment by Githook User [ 09/Oct/23 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-74793 dbCheck should not fail when detecting corruption

(cherry picked from commit 9a03e1930b2ffe150f21ac9836a70d799e353015)
(cherry picked from commit 619955a77b426ced3a5ec324a8f66f623d62b252)
Branch: v6.0
https://github.com/mongodb/mongo/commit/3170c79ed4aea301c170b10c09cec85336a20777

Comment by Githook User [ 18/May/23 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-74793 dbCheck should not fail when detecting corruption

(cherry picked from commit 9a03e1930b2ffe150f21ac9836a70d799e353015)
Branch: v7.0
https://github.com/mongodb/mongo/commit/619955a77b426ced3a5ec324a8f66f623d62b252

Comment by Githook User [ 19/Apr/23 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-74793 dbCheck should not fail when detecting corruption
Branch: master
https://github.com/mongodb/mongo/commit/9a03e1930b2ffe150f21ac9836a70d799e353015

Comment by Louis Williams [ 14/Mar/23 ]

It seems like the primary's behavior could be improved as well. We shouldn't stop dbCheck if we find one bad key, but rather continue checking in batches.

Generated at Thu Feb 08 06:28:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.