Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-11908

Failure to rollback usePowerOf2Sizes should not cause fatal error

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 2.4.6, 2.4.8
    • Fix Version/s: 2.4.9, 2.5.5
    • Component/s: Replication
    • Labels:
      None

      Description

      Issue Status as of December 30th, 2013

      ISSUE SUMMARY
      This issue only occurs if a replica set member enters a ROLLBACK state, and the operations being rolled back include a call to the collMod command which modifies usePowerOf2Sizes. If these conditions are encountered it will cause the member to shutdown and enter a FATAL state in the replica set.

      USER IMPACT
      If a replica set member encounters this command in the oplog section it is rolling back the server will shut down and the following messages will appear in the server log:

      Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet error can't rollback this command yet: { collMod: "files", usePowerOf2Sizes: true }
      Tue Sep 24 05:47:09.383 [rsBackgroundSync] replSet cmdname=collMod
      Tue Sep 24 05:47:09.384 [rsBackgroundSync] replSet replica set fatal exception
      Tue Sep 24 05:47:09.384 [rsBackgroundSync] replSet error fatal, stopping replication

      It will not be possible to restart the member successfully until this situation is cleared, the member will be left in the FATAL state.

      This issue is present in all versions of MongoDB prior to and including v2.4.8.

      SOLUTION
      Instead of shutting down, the call is ignored and a warning is logged: "replSet not rolling back change of usePowerOf2Sizes"

      WORKAROUNDS
      The best workaround is to re-sync the replica set member. See documentation on re-syncing a member.

      PATCHES
      Production release v2.4.9 contains the fix for this issue, and production release v2.6.0 will contain the fix as well.

      Original Description

      If a replica set member attempts a rollback of a period which contained

      { collMod: "files", usePowerOf2Sizes: true }

      this causes a fatal error. The replica set member is thereafter left in the FATAL state.

      While it seems reasonable that usePowerOf2Sizes cannot be rolled back, this is probably not the best user experience. I would prefer that my replica set member continue to function, even if the disk space allocation algorithm is different than what I asked for.

      Instead, this op could be skipped (with a loud warning)?

      Full log snippet demonstrating the problem:

      Tue Sep 24 05:47:06.335 [rsBackgroundSync] replSet syncing to: brs7.ny1.10gen.cc:27010
      Tue Sep 24 05:47:09.374 [rsBackgroundSync] replSet we are ahead of the sync source, will try to roll back
      Tue Sep 24 05:47:09.375 [rsBackgroundSync] replSet rollback 0
      Tue Sep 24 05:47:09.375 [rsBackgroundSync] replSet ROLLBACK
      Tue Sep 24 05:47:09.375 [rsBackgroundSync] replSet rollback 1
      Tue Sep 24 05:47:09.375 [rsBackgroundSync] replSet rollback 2 FindCommonPoint
      Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet info rollback our last optime:   Sep 24 05:47:05:39
      Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet info rollback their last optime: Sep 24 05:46:48:ab
      Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet info rollback diff in end of log times: 17 seconds
      Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet WARNING ignoring op on rollback no _id TODO : backupstore.system.indexes { ts: Timestamp 1380001625000|3, h: 2707240384590046610, v: 2, op: "i", ns: "backupstore.system.indexes", o: { name: "_id_1_filename_1", ns: "backupstore.files", key: { _id: 1, filename: 1 } } }
      Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet error can't rollback this command yet: { collMod: "files", usePowerOf2Sizes: true }
      Tue Sep 24 05:47:09.383 [rsBackgroundSync] replSet cmdname=collMod
      Tue Sep 24 05:47:09.384 [rsBackgroundSync] replSet replica set fatal exception
      Tue Sep 24 05:47:09.384 [rsBackgroundSync] replSet error fatal, stopping replication
      Tue Sep 24 05:47:09.755 [conn476406] end connection 10.10.0.135:59317 (6 connections now open)

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              eliot Eliot Horowitz (Inactive)
              Reporter:
              cailin.nelson Cailin Nelson
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: