[SERVER-16569] do not update the replication progress map when receiving a heartbeat from primary Created: 15/Dec/14  Updated: 18/Dec/14  Resolved: 17/Dec/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 2.8.0-rc3

Type: Bug Priority: Major - P3
Reporter: Ian Whalen (Inactive) Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

TEST HISTORY

EARLIEST KNOWN FAILURE

 m31002| 2014-12-07T13:06:08.412-0500 [initandlisten] connection accepted from 10.168.209.60:47814 #21 (7 connections now open)
 m31002| 2014-12-07T13:06:08.412-0500 [initandlisten] connection accepted from 10.168.209.60:47813 #22 (8 connections now open)
 m31002| 2014-12-07T13:06:08.413-0500 [conn1] command write_concern_few_arbiters.$cmd command: insert { insert: "foo", documents: [ { _id: ObjectId('5484970fa3cd07836941245c'), x: 13.0 } ], ordered: true, writeConcern: { w: "majority", wtimeout: 9000.0 } } keyUpdates:0 numYields:0 locks(micros) w:4 reslen:80 1003ms
assert: no write error: { "nInserted" : 1 }
Error: no write error: { "nInserted" : 1 }
    at Error (<anonymous>)
    at doassert (src/mongo/shell/assert.js:11:14)
    at Function.assert.writeError (src/mongo/shell/assert.js:421:9)
    at /data/mci/shell/src/jstests/multiVersion/w_majority_change.js:144:12
    at /data/mci/shell/src/jstests/multiVersion/w_majority_change.js:153:2



 Comments   
Comment by Githook User [ 17/Dec/14 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-16569 do not update the replication progress map when receiving a heartbeat from primary
Branch: master
https://github.com/mongodb/mongo/commit/4f329663c1908d90a73c69e8223520e5aecc8608

Comment by Matt Dannenberg [ 16/Dec/14 ]

The trouble here is the 2.6 primary erroneously reporting a "majority" writeConcern as satisfied when only three of seven nodes are up ("majority" should need four to be satisfied).

This is caused by the primary being alerted of its own replication progress. In 2.6, nodes would simply subtract one from the writeConcern quantity to account for itself. In 2.8, nodes track their own progress along side all other nodes. Additionally, in 2.8, nodes update replication progress via heartbeat as well as the replSetUpdatePosition command.

So, when a 2.8 node receives a heartbeat from the 2.6 primary, the 2.8 node updates its replication progress map to reflect the primary's progress and then forwards this progress to the primary (or along a chained path where it will eventually reach the primary). When the 2.6 primary receives this progress update it sees that it has three nodes (which is majority minus one to account for itself) have replicated the op and does not know that one of those three is itself (which is now being double counted).

The solution is to not update the replication progress map when receiving a heartbeat from the primary.

Generated at Thu Feb 08 03:41:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.