[SERVER-7200] use oplog as op buffer on secondaries Created: 28/Sep/12  Updated: 06/Feb/18  Resolved: 23/Aug/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.2.0
Fix Version/s: 3.2.11, 3.3.12

Type: Improvement Priority: Major - P3
Reporter: Eric Milkie Assignee: Mathias Stearn
Resolution: Done Votes: 5
Labels: code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-19867 Do not read directly from collections... Closed
is depended on by SERVER-7275 node can't roll back if behind minValid Closed
is depended on by SERVER-8476 slaveDelay with Ghostsync Closed
is depended on by SERVER-25160 Drain and catchup modes shouldn't con... Closed
is depended on by SERVER-22061 DataReplicator: Support resume per co... Closed
Duplicate
is duplicated by SERVER-26570 MongoDB Slave Crashed During Replication Closed
is duplicated by SERVER-27142 aborting after fassert() failure Closed
is duplicated by SERVER-23841 Mongod always complain "Fatal asserti... Closed
is duplicated by SERVER-25026 Secondary abort immediately following... Closed
is duplicated by SERVER-26271 UnrecoverableRollbackError after elec... Closed
is duplicated by SERVER-24725 Hidden Member switched sync source mu... Closed
is duplicated by SERVER-10751 Keep fetching oplog during index crea... Closed
Related
related to SERVER-13222 Secondary keeps getting into an incon... Closed
related to SERVER-24223 Add hash to minvalid OpTime boundaries Closed
related to SERVER-28688 Deadlock between shutdown and stepdown Closed
related to SERVER-22774 Update oplog fetcher using logic from... Closed
related to SERVER-23499 implement a blocking queue that works... Closed
is related to SERVER-5853 Decouple replication to oplog from ap... Closed
is related to SERVER-25071 Ensure replication batch finishes bef... Closed
is related to SERVER-25697 Rename BatchBoundaries::start to be "... Closed
Backwards Compatibility: Fully Compatible
Backport Completed:
Sprint: Repl 2016-08-29
Participants:
Case:
Linked BF Score: 0

 Description   

Currently, we use an in-memory queue to buffer operations between the network reader and the op writer. Instead, we could simply use the local.oplog collection as the buffer.

Added advantage: we'll know what operations in the batch were actively being applied if we crash in the middle of a batch.



 Comments   
Comment by pravin dwiwedi [ 14/Feb/17 ]

Thanks Thomas for your quick response.
Resync t is I think safe and simple solution to get rid of the problem.

But I am not able to start the affected node(mongod), as soon as I start the node it gets started but give below exception and gets shutdown.
How can we start the node? Do I need to deleted the Rollback , Diagnostic directories and Lock files in order to start the Mongod?

Regards
Pravin Dwiwedi

Comment by Kelsey Schubert [ 14/Feb/17 ]

Hi 2k.pravin@gmail.com,

Once a node has entered this state, the only way to resolve the issue is to resync the affected node by performing an initial sync. Upgrading would prevent the node from reaching this inconsistent state, but would not correct a node already affected by this issue.

Kind regards,
Thomas

Comment by pravin dwiwedi [ 14/Feb/17 ]

Hi Team, Thanks for fixes.

I am facing the same problem while running on Mongo 3.2.7.

Whenever I restart the server it gives exception and Mongod gets stopped--

_2017-02-14T11:00:08.580+0000 I REPL [rsBackgroundSync] Starting rollback due to OplogStartMissing: our last op time fetched: (term: 36, timestamp: Feb 13 22:53:33:3). source's GTE: (term: 37, timestamp: Feb 13 22:53:36:1) hashes: (3451672896402616640/2249844032076922861)
2017-02-14T11:00:08.580+0000 I - [rsBackgroundSync] Fatal assertion 18750 UnrecoverableRollbackError: need to rollback, but in inconsistent state. minvalid: (term: 37, timestamp: Feb 13 22:53:39:3) > our last optime: (term: 36, timestamp: Feb 13 22:53:33:3)
2017-02-14T11:00:08.580+0000 I - [rsBackgroundSync] _

***aborting after fassert() failure

Is there any work around to fix this issue without upgrading the Mongo?

Regards
Pravin Dwiwedi

Comment by Githook User [ 20/Jan/17 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'redbeard0531@gmail.com'}

Message: SERVER-27050 Ensure upstream node doesn't roll back after checking MinValid

(cherry picked from commit 0b76764eac7651ddba4c82c504aa7e8d785087c2)

SERVER-25860 Allow replication rollback to drop system collections

(cherry picked from commit 99e19b1ded425a1d859a9bc52fd5c2712e71f83a)

SERVER-25860 Remove redundant operations during rollback

(cherry picked from commit 2dec7e9a15af8e0fc4d8e68ed40e3abe90b3a3b3)

SERVER-25862 Add a test of replaying a batch at startup with update and delete of same object

This is a special case of SERVER-7200 that interacts with plans for
SERVER-25862.

(cherry picked from commit 8e7231a38341d68fb2cdc60509687397e9a17741)

SERVER-27282 clean up RS rollback error handling

(cherry picked from commit ef1f1739d6cbff9fb4ddbcc77d467f183c0ab9f2)

(all cherry picked from v3.4 commit f4cab348460c90fcd506b2d46bf8c830b7b87379)
Branch: v3.2
https://github.com/mongodb/mongo/commit/fb5a39c59d661021c99ba3548e4e5be2e2fb50f5

Comment by Githook User [ 13/Jan/17 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-27050 Ensure upstream node doesn't roll back after checking MinValid

(cherry picked from commit 0b76764eac7651ddba4c82c504aa7e8d785087c2)

SERVER-25860 Allow replication rollback to drop system collections

(cherry picked from commit 99e19b1ded425a1d859a9bc52fd5c2712e71f83a)

SERVER-25860 Remove redundant operations during rollback

(cherry picked from commit 2dec7e9a15af8e0fc4d8e68ed40e3abe90b3a3b3)

SERVER-25862 Add a test of replaying a batch at startup with update and delete of same object

This is a special case of SERVER-7200 that interacts with plans for
SERVER-25862.

(cherry picked from commit 8e7231a38341d68fb2cdc60509687397e9a17741)

SERVER-27282 clean up RS rollback error handling

(cherry picked from commit ef1f1739d6cbff9fb4ddbcc77d467f183c0ab9f2)
Branch: v3.4
https://github.com/mongodb/mongo/commit/f4cab348460c90fcd506b2d46bf8c830b7b87379

Comment by Githook User [ 03/Jan/17 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-25862 Add a test of replaying a batch at startup with update and delete of same object

This is a special case of SERVER-7200 that interacts with plans for
SERVER-25862.
Branch: master
https://github.com/mongodb/mongo/commit/8e7231a38341d68fb2cdc60509687397e9a17741

Comment by Githook User [ 17/Oct/16 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'redbeard0531@gmail.com'}

Message: SERVER-7200 Write oplog entries on secondaries before applying

Manual backport of 34c6c691a038eac1ac3ee16e1eedc54aab964774 along with fixes
and tests from:

b5d2b06f8a08171fd96ef8d128c4f7ecedcb8f93
dc83fb0433fcae6e72f035df7458473b59223eb5
fec839b99f4b9e08016112fe8b9492e327af91b8
bf86770c8a5de97b30bc008ad59e34de99065c60
Branch: v3.2
https://github.com/mongodb/mongo/commit/5db0a55a264ee326bff5598249639ef479628f37

Comment by Githook User [ 17/Oct/16 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-7200 Limit secondary apply batches to 10% of the oplog size

(cherry picked from commit b06901cd83b2a985aa50f9a699f3d63dcd28476d)
Branch: v3.2
https://github.com/mongodb/mongo/commit/e207e1a4809742a5cd0bb456202c82ff82548a44

Comment by Githook User [ 16/Sep/16 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-26016 Add a basic test of oplog replay on startup

Fixes a small but important bug in SERVER-7200.
Branch: master
https://github.com/mongodb/mongo/commit/dc83fb0433fcae6e72f035df7458473b59223eb5

Comment by Githook User [ 23/Aug/16 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-7200 Limit secondary apply batches to 10% of the oplog size
Branch: master
https://github.com/mongodb/mongo/commit/b06901cd83b2a985aa50f9a699f3d63dcd28476d

Comment by Githook User [ 23/Aug/16 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-7200 Write oplog entries on secondaries before applying
Branch: master
https://github.com/mongodb/mongo/commit/34c6c691a038eac1ac3ee16e1eedc54aab964774

Comment by Githook User [ 23/Aug/16 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-7200 stop consulting shouldShutdown in OpQueueBatcher
Branch: master
https://github.com/mongodb/mongo/commit/3bcafe4fe23c9521fe028a176fffabdc79d434e9

Comment by Githook User [ 23/Aug/16 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-7200 Delete unused Status variable
Branch: master
https://github.com/mongodb/mongo/commit/43c0c3ab90eb14c4b52324feeb81c3b3eef57403

Comment by Mathias Stearn [ 17/Aug/16 ]

ramon.fernandez, I'm moving this to 3.3.12. We either need to get this in or disable multi-threaded oplog writing on secondaries before going into RCs, otherwise it will make upgrade/downgrade between 3.4 RCs a pain. This isn't an issue for upgrade/downgrade to/from 3.2 because it writes the oplog in a single thread.

Generated at Thu Feb 08 03:13:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.