Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33258

Replication must create necessary steady-state internal tables before coming out of initial sync

    • Fully Compatible
    • Repl 2018-02-26

      KVStorageEngine implementations have their catalog persisted as "yet another" record store named the `_mdb_catalog`. For storage engines that support `recoverToStableTimestamp`, this table is not journaled, meaning it's only persisted when a stable checkpoint is taken, or from create collection oplog entries being replayed on replication recovery at startup.

      Replication, naturally, does not replicate its internal collections which can lead to the following sequence:

      1. Exit initial sync at time T. T is also the stable timestamp.
      2. Node becomes a secondary.
      3. Create the `oplogTruncateAfterPoint` collection.
      4. Begin processing a patch, performing a write to the `oplogTruncateAfterPoint`.
      5. The node crashes. The `oplogTruncateAfterPoint` document is required to correctly recover.
      6. Node restarts.
      7. MongoDB sees a storage engine table without a corresponding MongoDB collection, the table gets removed.
      8. Replication recovery plays. Assumes there was no `oplogTruncateAfterPoint`, resulting in data corruption.

      Explicitly creating `oplogTruncateAfterPoint` before coming out of initial sync is sufficient to guarantee that if a node starts up and decides it has completed initial sync, then the `oplogTruncateAfterPoint` collection will exist.

            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            0 Vote for this issue
            3 Start watching this issue