[SERVER-27053] Possibility to confirm w:majority write that has been rolled back Created: 15/Nov/16  Updated: 13/Apr/17  Resolved: 17/Nov/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.2.12, 3.4.0-rc4, 3.5.1

Type: Bug Priority: Blocker - P1
Reporter: Mathias Stearn Assignee: Spencer T Brody
Resolution: Fixed Votes: 0
Labels: code-and-test

Issue Links:
Backports
Related
related to SERVER-27149 Sync source selection doesn't conside... Closed
related to SERVER-27123 Only update commit point via spanning... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Repl 2016-11-21
Participants:

 Description   

I think the following sequence of events will cause us to acknowlege a w:majority write that has been rolled back. It requires that the write comes from mongos so the flag to not drop the connection on stepdown has been set.

  1. mongos sends a write to a shard with w:majority
  2. write gets applied locally
  3. a majority of secondaries vote for a new primary and it wins the election without the original primary knowing
  4. the nodes that elected the new primary confirm the w:majority write to the old primary - the updatePosition command from those secondaries indicates a new term so the old primary steps down. When stepping down we cancel all user operations and kill all non-internal connections, but the connection that issued this write came from a mongos so it isn't closed
  5. the original primary and all the secondaries go into rollback and revert the write, and successfully replicate the new op that the new primary writes on election
  6. the original primary is re-elected
  7. the thread that issued the write on the original primary gets into awaitReplication(), sees that it is in state primary as exected and that the write it's waiting for has already been confirmed on a majority and returns success

If awaitReplication_inlock() checked for interrupt before checking if the write was already satisfied, then we'd be okay since during stepdown we cancelled all running operations. But we don't ever check for interrupt in awaitReplication if the writeConcern is already satisfied by the time we reach awaitReplication().



 Comments   
Comment by Githook User [ 17/Nov/16 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}

Message: SERVER-27053 Don't acknowledge writes if the term has changed.
Branch: master
https://github.com/mongodb/mongo/commit/8347e322cd46e8ee847e1730a7e94ea8e3981c53

Comment by Githook User [ 17/Nov/16 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}

Message: SERVER-27053 Don't acknowledge writes if the term has changed.

(cherry picked from commit a557fd981d235f84d4a0865dc0bb6b5385fc7a21)
Branch: v3.4
https://github.com/mongodb/mongo/commit/c7ebfd0fd292e45256e9799a2a96ed6054ecc357

Comment by Spencer T Brody [ 30/Dec/16 ]

The bug in the description of this ticket doesn't apply to 3.2 since we close all connections on stepdown, there's no chance for the primary to step back up and then confirm a write from its old term. Therefore I am cancelling the backport to 3.2 request on this ticket.

The fix in the commit on this ticket however also fixed one bug not in the description - which is a heartbeat that indicates that the stale primary needs to step down also causes it to advance its commit point and acknowledge a write that can in fact be rolled back. The fix for 3.2 for that bug comes from backporting SERVER-27123. When we backport SERVER-27123 we should also grab the write_concern_after_stepdown.js test from the commit on this ticket, and include that with the backport

Comment by Githook User [ 25/Jan/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-27123 Only update the commit point as a secondary from oplog queries against your sync source
(cherry picked from commit 87f49488f1b5c872daa71fd2fd9b5d744409a817)

SERVER-27680 Merge stopOplogFetcher and pauseRsBgSyncProducer failpoint into single stopReplProducer failpoint
(cherry picked from commit 21948042b6da5fb5bf15897f9808a70551f5af09)

SERVER-27053 Don't acknowledge writes if the term has changed.
(cherry picked from commit 8347e322cd46e8ee847e1730a7e94ea8e3981c53)
Branch: v3.2
https://github.com/mongodb/mongo/commit/4a6efad4d422b9a06ff0b7e98bfc9b7cc63b5864

Generated at Sat Sep 23 15:07:18 UTC 2017 using JIRA 7.2.10#72012-sha1:2651463a07e52d81c0fcf01da710ca333fcb42bc.