[SERVER-42915] New style repair's catalog corrections are often false positives, aggressively marking repl nodes as corrupted Created: 20/Aug/19  Updated: 29/Oct/23  Resolved: 09/Sep/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.0.13, 4.2.1, 4.3.1

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Daniel Gottlieb (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2, v4.0
Sprint: Execution Team 2019-09-09
Participants:

 Description   

It's a legal and expected state for MongoDB to be shutdown (cleanly or via a crash) when there are WT tables on disk, but the checkpointed MDB catalog does not have a collection referencing that table.

Historically this could happen because open cursors would prevent table drops from succeeding. More recently, stable checkpoints can leave nodes in this state after a clean shutdown.

Non-repair MongoDB startup cleans up these leftover tables. However, a node running new-style repair will create a new, "anonymous" collection in the local database to reference these tables for users to inspect their data and make a decision on what to do with it. In addition to re-linking the table into a collection, new-style repair writes a document signaling the node had corruption which keeps the node from rejoining the replica set as a dutiful member. Our powercycle testing gives us confidence that these repairs are likely false positives.

To resolve this state, a user must resync or manually remove the document. Resyncing is expensive. It's not clear when it's safe to override the corruption document.

Options for improving the user experience include:

  • Require MongoDB to always have a bijection (one to one relationship) between collection and tables.
  • Relaxing repair's requirement that a bijection exists between collections and tables.


 Comments   
Comment by Githook User [ 26/Sep/19 ]

Author:

{'username': 'dgottlieb', 'email': 'daniel.gottlieb@mongodb.com', 'name': 'Daniel Gottlieb'}

Message: SERVER-42915: Do not require a resync when repair encounters orphaned collection objects.

(cherry picked from commit f4e387fa1b7e369ce067650bdda9c8676683b929)
Branch: v4.0
https://github.com/mongodb/mongo/commit/e1370d816a25a14f32eeb5990302c640eb872730

Comment by Githook User [ 16/Sep/19 ]

Author:

{'username': 'dgottlieb', 'email': 'daniel.gottlieb@mongodb.com', 'name': 'Daniel Gottlieb'}

Message: SERVER-42915: Do not require a resync when repair encounters orphaned collection objects.

(cherry picked from commit f4e387fa1b7e369ce067650bdda9c8676683b929)
Branch: v4.2
https://github.com/mongodb/mongo/commit/febbdd2456ab8e4526b8f1fa34f50989791216a1

Comment by Githook User [ 09/Sep/19 ]

Author:

{'name': 'Daniel Gottlieb', 'username': 'dgottlieb', 'email': 'daniel.gottlieb@mongodb.com'}

Message: SERVER-42915: Do not require a resync when repair encounters orphaned collection objects.
Branch: master
https://github.com/mongodb/mongo/commit/f4e387fa1b7e369ce067650bdda9c8676683b929

Generated at Thu Feb 08 05:01:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.