Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-10168

Change oplog size instructions do not account for Rollback scenarios

    XMLWordPrintableJSON

Details

    Description

      I am following the instructions here to resize the oplog - https://docs.mongodb.com/manual/tutorial/change-oplog-size/.

      However the instructions here do not account for the fact that a rollback error could happen when you are trying to resize the oplog on the primary. Instead of copying just the last saved entry shouldn't we be copying the entire previous oplog?

      Once we hit this error the only way out appears to be to resync the cluster

      2017-04-22T06:02:26.334+0000 I REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: our last op time fetched: (term: -1, timestamp: Apr 22 05:46:50:e). source's G
      TE: (term: 832, timestamp: Apr 22 05:47:01:2) hashes: (-8212210602019233945/-4988990565338983357)
      2017-04-22T06:02:26.334+0000 I REPL     [rsBackgroundSync] beginning rollback
      2017-04-22T06:02:26.334+0000 I REPL     [rsBackgroundSync] rollback 0
      2017-04-22T06:02:26.334+0000 I REPL     [ReplicationExecutor] transition to ROLLBACK
      2017-04-22T06:02:26.334+0000 I NETWORK  [conn2] end connection 54.215.75.22:45878 (10 connections now open)
      2017-04-22T06:02:26.334+0000 I NETWORK  [conn4] end connection 54.215.75.22:45880 (10 connections now open)
      2017-04-22T06:02:26.334+0000 I NETWORK  [conn5] end connection 54.219.33.176:46482 (10 connections now open)
      2017-04-22T06:02:26.334+0000 I NETWORK  [conn7] end connection 10.33.131.5:43168 (10 connections now open)
      2017-04-22T06:02:26.334+0000 I NETWORK  [conn10] end connection 54.170.243.118:43286 (10 connections now open)
      2017-04-22T06:02:26.334+0000 I NETWORK  [conn12] end connection 10.123.166.151:40596 (8 connections now open)
      2017-04-22T06:02:26.334+0000 I NETWORK  [conn6] end connection 10.30.222.199:45690 (10 connections now open)
      2017-04-22T06:02:26.334+0000 I NETWORK  [conn1] end connection 54.170.243.118:43284 (9 connections now open)
      2017-04-22T06:02:26.334+0000 I NETWORK  [conn8] end connection 54.151.49.85:42197 (8 connections now open)
      2017-04-22T06:02:26.334+0000 I NETWORK  [conn3] end connection 10.164.16.253:42084 (10 connections now open)
      2017-04-22T06:02:26.334+0000 I REPL     [rsBackgroundSync] rollback 1
      2017-04-22T06:02:26.334+0000 I NETWORK  [conn9] end connection 52.44.59.57:53018 (8 connections now open)
      2017-04-22T06:02:26.362+0000 I NETWORK  [initandlisten] connection accepted from 54.78.134.204:57990 #13 (1 connection now open)
      2017-04-22T06:02:26.707+0000 I ACCESS   [conn13] Successfully authenticated as principal __system on local
      2017-04-22T06:02:26.862+0000 I REPL     [rsBackgroundSync] rollback 2 FindCommonPoint
      2017-04-22T06:02:26.927+0000 I REPL     [rsBackgroundSync] rollback our last optime:   Apr 22 05:46:50:e
      2017-04-22T06:02:26.927+0000 I REPL     [rsBackgroundSync] rollback their last optime: Apr 22 06:02:26:4
      2017-04-22T06:02:26.927+0000 I REPL     [rsBackgroundSync] rollback diff in end of log times: -936 seconds
      2017-04-22T06:02:27.514+0000 I NETWORK  [initandlisten] connection accepted from 54.215.75.22:45882 #14 (2 connections now open)
      2017-04-22T06:02:27.574+0000 I NETWORK  [initandlisten] connection accepted from 54.170.243.118:43288 #15 (3 connections now open)
      2017-04-22T06:02:27.837+0000 I ACCESS   [conn14] Successfully authenticated as principal __system on local
      2017-04-22T06:02:27.899+0000 W REPL     [rsBackgroundSync] ignoring op on rollback no ns TODO : { _id: ObjectId('58faefc2517306110e6961a5'), ts: Timestamp 1492840010000|14, h: -8212
      210602019233945 }
      2017-04-22T06:02:27.900+0000 F REPL     [rsBackgroundSync] rollback error RS101 reached beginning of local oplog
      2017-04-22T06:02:27.900+0000 I REPL     [rsBackgroundSync]     scanned: 4551
      2017-04-22T06:02:27.900+0000 I REPL     [rsBackgroundSync]   theirTime: Apr 22 05:46:50 58faee4a:d
      2017-04-22T06:02:27.900+0000 I REPL     [rsBackgroundSync]   ourTime:   Apr 22 05:46:50 58faee4a:e
      2017-04-22T06:02:27.900+0000 E REPL     [rsBackgroundSync] NoMatchingDocument: RS101 reached beginning of local oplog [2]
      2017-04-22T06:02:27.900+0000 I REPL     [rsBackgroundSync] rollback finished
      2017-04-22T06:02:27.900+0000 I -        [rsBackgroundSync] Fatal assertion 28723 UnrecoverableRollbackError: need to rollback, but unable to determine common point between local and
       remote oplog: NoMatchingDocument: RS101 reached beginning of local oplog [2] @ 18752
      2017-04-22T06:02:27.900+0000 I -        [rsBackgroundSync]
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            dharshanr@scalegrid.net Dharshan Rangegowda
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              1 year, 14 weeks, 2 days ago