Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-42915

New style repair's catalog corrections are often false positives, aggressively marking repl nodes as corrupted

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0.13, 4.2.1, 4.3.1
    • Component/s: Storage
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.2, v4.0
    • Sprint:
      Execution Team 2019-09-09

      Description

      It's a legal and expected state for MongoDB to be shutdown (cleanly or via a crash) when there are WT tables on disk, but the checkpointed MDB catalog does not have a collection referencing that table.

      Historically this could happen because open cursors would prevent table drops from succeeding. More recently, stable checkpoints can leave nodes in this state after a clean shutdown.

      Non-repair MongoDB startup cleans up these leftover tables. However, a node running new-style repair will create a new, "anonymous" collection in the local database to reference these tables for users to inspect their data and make a decision on what to do with it. In addition to re-linking the table into a collection, new-style repair writes a document signaling the node had corruption which keeps the node from rejoining the replica set as a dutiful member. Our powercycle testing gives us confidence that these repairs are likely false positives.

      To resolve this state, a user must resync or manually remove the document. Resyncing is expensive. It's not clear when it's safe to override the corruption document.

      Options for improving the user experience include:

      • Require MongoDB to always have a bijection (one to one relationship) between collection and tables.
      • Relaxing repair's requirement that a bijection exists between collections and tables.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: