Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Blocker - P1
Fix Version/s: 3.2.12, 3.4.0-rc4, 3.5.1
Affects Version/s: None
Component/s: Replication
Labels:
- code-and-test

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Completed:

3.4.0-rc4
Sprint:
Repl 2016-11-21
Linked BF Score:
0
Confidence Status:
None
Work Order:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I think the following sequence of events will cause us to acknowlege a w:majority write that has been rolled back. It requires that the write comes from mongos so the flag to not drop the connection on stepdown has been set.

mongos sends a write to a shard with w:majority
write gets applied locally
a majority of secondaries vote for a new primary and it wins the election without the original primary knowing
the nodes that elected the new primary confirm the w:majority write to the old primary - the updatePosition command from those secondaries indicates a new term so the old primary steps down. When stepping down we cancel all user operations and kill all non-internal connections, but the connection that issued this write came from a mongos so it isn't closed
the original primary and all the secondaries go into rollback and revert the write, and successfully replicate the new op that the new primary writes on election
the original primary is re-elected
the thread that issued the write on the original primary gets into awaitReplication(), sees that it is in state primary as exected and that the write it's waiting for has already been confirmed on a majority and returns success

If awaitReplication_inlock() checked for interrupt before checking if the write was already satisfied, then we'd be okay since during stepdown we cancelled all running operations. But we don't ever check for interrupt in awaitReplication if the writeConcern is already satisfied by the time we reach awaitReplication().

related to

SERVER-27149 Sync source selection doesn't consider terms

Closed

SERVER-27123 Only update commit point via spanning tree

Closed

Assignee:: Spencer Brody (Inactive)
Reporter:: Mathias Stearn
Participants:: Githook User, Mathias Stearn, Spencer Brody
Votes:: 0 Vote for this issue
Watchers:: 15 Start watching this issue

Created:: Nov 15 2016 09:40:43 PM UTC
Updated:: Apr 13 2017 08:39:12 PM UTC
Resolved:: Nov 17 2016 09:32:51 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates