[SERVER-12516] Multi-updates may fail to detect replica set primary step-down, leading to inconsistency. Created: 28/Jan/14  Updated: 11/Jul/16  Resolved: 19/Feb/14

Status: Closed
Project: Core Server
Component/s: Replication, Write Ops
Affects Version/s: 2.4.9, 2.5.5
Fix Version/s: 2.6.0-rc0

Type: Bug Priority: Major - P3
Reporter: Andy Schwerin Assignee: Andy Schwerin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-12749 write commands fail to check replica ... Closed
Operating System: ALL
Steps To Reproduce:

Start up a 2-node replica set. Connect a shell, with legacy write operations or write commands.

for (i = 0; i < 100 * 1000; ++i) { db.foo.insert({_id: i, a: 1}) }
db.getLastError();
db.foo.update({}, {$inc: { a: 5 }}, false, true);  // multi-update

From another shell, immediately run

db.adminCommand({replSetStepDown: 30, force: true})

Notice in the log on the primary a stack trace and the following message

Assertion: 13312:replSet error : logOp() but not primary?

Participants:

 Description   

If the primary steps down while in the middle of a multi-update, the operation may continue to update documents until it first attempts to log the op to the oplog. At that point, the logOp() will fail, but the database is inconsistent. The database will contain the last update, but it won't appear in the oplog, and so will not replicate. It also won't get rolled back when the new primary takes writes, because there's no trace of it in the oplog.

A minimal option would be to make the current massert() on this condition an fassert(), to eliminate corruption.

Later, it will be necessary to audit all insert, update and remove paths (legacy and write command) to ensure that they validate primary-ness after recovering from yields.



 Comments   
Comment by Githook User [ 19/Feb/14 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@10gen.com'}

Message: SERVER-12516 Update assert code.
Branch: master
https://github.com/mongodb/mongo/commit/1954504cacb580e06ab20742edf1f3a3c72f75fb

Comment by Githook User [ 19/Feb/14 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@10gen.com'}

Message: SERVER-12516 Check for change in primary state when recovering from yield in CRUD operations.

Without this patch, a replicaset member running a long-running write operation
that yields might not notice, on yield recovery, that the node is no longer a
primary, and so no longer entitled to perform writes.
Branch: master
https://github.com/mongodb/mongo/commit/548693879eddfe3051a7303245dcfedde3a0ac61

Generated at Thu Feb 08 03:28:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.