[DOCS-10168] Change oplog size instructions do not account for Rollback scenarios Created: 22/Apr/17  Updated: 30/Oct/23

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: Server_Docs_20231030

Type: Bug Priority: Major - P3
Reporter: Dharshan Rangegowda Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:
Days since reply: 1 year, 14 weeks, 2 days ago
Epic Link: DOCSP-1769

 Description   

I am following the instructions here to resize the oplog - https://docs.mongodb.com/manual/tutorial/change-oplog-size/.

However the instructions here do not account for the fact that a rollback error could happen when you are trying to resize the oplog on the primary. Instead of copying just the last saved entry shouldn't we be copying the entire previous oplog?

Once we hit this error the only way out appears to be to resync the cluster

2017-04-22T06:02:26.334+0000 I REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: our last op time fetched: (term: -1, timestamp: Apr 22 05:46:50:e). source's G
TE: (term: 832, timestamp: Apr 22 05:47:01:2) hashes: (-8212210602019233945/-4988990565338983357)
2017-04-22T06:02:26.334+0000 I REPL     [rsBackgroundSync] beginning rollback
2017-04-22T06:02:26.334+0000 I REPL     [rsBackgroundSync] rollback 0
2017-04-22T06:02:26.334+0000 I REPL     [ReplicationExecutor] transition to ROLLBACK
2017-04-22T06:02:26.334+0000 I NETWORK  [conn2] end connection 54.215.75.22:45878 (10 connections now open)
2017-04-22T06:02:26.334+0000 I NETWORK  [conn4] end connection 54.215.75.22:45880 (10 connections now open)
2017-04-22T06:02:26.334+0000 I NETWORK  [conn5] end connection 54.219.33.176:46482 (10 connections now open)
2017-04-22T06:02:26.334+0000 I NETWORK  [conn7] end connection 10.33.131.5:43168 (10 connections now open)
2017-04-22T06:02:26.334+0000 I NETWORK  [conn10] end connection 54.170.243.118:43286 (10 connections now open)
2017-04-22T06:02:26.334+0000 I NETWORK  [conn12] end connection 10.123.166.151:40596 (8 connections now open)
2017-04-22T06:02:26.334+0000 I NETWORK  [conn6] end connection 10.30.222.199:45690 (10 connections now open)
2017-04-22T06:02:26.334+0000 I NETWORK  [conn1] end connection 54.170.243.118:43284 (9 connections now open)
2017-04-22T06:02:26.334+0000 I NETWORK  [conn8] end connection 54.151.49.85:42197 (8 connections now open)
2017-04-22T06:02:26.334+0000 I NETWORK  [conn3] end connection 10.164.16.253:42084 (10 connections now open)
2017-04-22T06:02:26.334+0000 I REPL     [rsBackgroundSync] rollback 1
2017-04-22T06:02:26.334+0000 I NETWORK  [conn9] end connection 52.44.59.57:53018 (8 connections now open)
2017-04-22T06:02:26.362+0000 I NETWORK  [initandlisten] connection accepted from 54.78.134.204:57990 #13 (1 connection now open)
2017-04-22T06:02:26.707+0000 I ACCESS   [conn13] Successfully authenticated as principal __system on local
2017-04-22T06:02:26.862+0000 I REPL     [rsBackgroundSync] rollback 2 FindCommonPoint
2017-04-22T06:02:26.927+0000 I REPL     [rsBackgroundSync] rollback our last optime:   Apr 22 05:46:50:e
2017-04-22T06:02:26.927+0000 I REPL     [rsBackgroundSync] rollback their last optime: Apr 22 06:02:26:4
2017-04-22T06:02:26.927+0000 I REPL     [rsBackgroundSync] rollback diff in end of log times: -936 seconds
2017-04-22T06:02:27.514+0000 I NETWORK  [initandlisten] connection accepted from 54.215.75.22:45882 #14 (2 connections now open)
2017-04-22T06:02:27.574+0000 I NETWORK  [initandlisten] connection accepted from 54.170.243.118:43288 #15 (3 connections now open)
2017-04-22T06:02:27.837+0000 I ACCESS   [conn14] Successfully authenticated as principal __system on local
2017-04-22T06:02:27.899+0000 W REPL     [rsBackgroundSync] ignoring op on rollback no ns TODO : { _id: ObjectId('58faefc2517306110e6961a5'), ts: Timestamp 1492840010000|14, h: -8212
210602019233945 }
2017-04-22T06:02:27.900+0000 F REPL     [rsBackgroundSync] rollback error RS101 reached beginning of local oplog
2017-04-22T06:02:27.900+0000 I REPL     [rsBackgroundSync]     scanned: 4551
2017-04-22T06:02:27.900+0000 I REPL     [rsBackgroundSync]   theirTime: Apr 22 05:46:50 58faee4a:d
2017-04-22T06:02:27.900+0000 I REPL     [rsBackgroundSync]   ourTime:   Apr 22 05:46:50 58faee4a:e
2017-04-22T06:02:27.900+0000 E REPL     [rsBackgroundSync] NoMatchingDocument: RS101 reached beginning of local oplog [2]
2017-04-22T06:02:27.900+0000 I REPL     [rsBackgroundSync] rollback finished
2017-04-22T06:02:27.900+0000 I -        [rsBackgroundSync] Fatal assertion 28723 UnrecoverableRollbackError: need to rollback, but unable to determine common point between local and
 remote oplog: NoMatchingDocument: RS101 reached beginning of local oplog [2] @ 18752
2017-04-22T06:02:27.900+0000 I -        [rsBackgroundSync]



 Comments   
Comment by Education Bot [ 31/Oct/22 ]

Hello! This ticket has been closed due to inactivity. If you believe this ticket is still important, please reopen it and leave a comment to explain why. Thank you!

Comment by Kelsey Schubert [ 24/Apr/17 ]

Hi dharshanr@scalegrid.net,

Since this ticket regards our documentation, I've moved it to the DOCS project for consideration.

Thank you,
Thomas

Generated at Thu Feb 08 07:59:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.