[SERVER-37846] writeConcern can be satisfied with an arbiter if the write was committed Created: 31/Oct/18 Updated: 29/Oct/23 Resolved: 24/Jan/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.4.17, 3.6.8, 4.0.3, 4.1.4 |
| Fix Version/s: | 3.6.15, 4.0.7, 4.1.8, 3.4.24 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Samyukta Lanka | Assignee: | Vesselina Ratcheva (Inactive) |
| Resolution: | Fixed | Votes: | 3 |
| Labels: | neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
|||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
|||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | |||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.0, v3.6, v3.4
|
|||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: |
|
|||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2018-12-17, Repl 2019-01-14, Repl 2019-01-28 | |||||||||||||||||||||||||||||||||||||
| Participants: | ||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | |||||||||||||||||||||||||||||||||||||
| Description |
|
There is an issue when using a PSSA architecture where one node is hidden with 0 votes and 0 priority. It occurs when the node with 0 votes goes down for some reason and the following write is issued:
This is expected to fail because there are not enough data bearing nodes to satisfy the writeConcern. The write actually succeeds though:
In this architecture, only two nodes are required to receive the write for it to be considered replicated to the majority of nodes (because we only consider nodes with a vote when determining the majority). Once both the primary and secondary apply the write, it will be committed and the arbiter will get sent the new lastCommittedOpTime. To determine if the writeConcern is satisfied, the topology coordinator looks at every node in the replica set to see if enough of them have replicated the write. The topology coordinator also asks the arbiter, which will say its lastAppliedOpTime is the lastCommittedOpTime that it was just sent. So even though the write was replicated on only 2 nodes, the topology coordinator thinks that it was replicated to 3 nodes and says that the writeConcern is satisfied. |
| Comments |
| Comment by Githook User [ 24/Sep/19 ] |
|
Author: {'username': 'vessy-mongodb', 'email': 'vesselina.ratcheva@mongodb.com', 'name': 'Vesselina Ratcheva'}Message: (cherry picked from commit b023cfd4db379092f7642dd825d79652d905f847) |
| Comment by Githook User [ 24/Sep/19 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'username': 'vessy-mongodb', 'email': 'vesselina.ratcheva@mongodb.com'}Message: (cherry picked from commit b023cfd4db379092f7642dd825d79652d905f847) |
| Comment by Githook User [ 27/Feb/19 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'username': 'vessy-mongodb', 'email': 'vesselina.ratcheva@10gen.com'}Message: (cherry picked from commit b023cfd4db379092f7642dd825d79652d905f847) |
| Comment by Githook User [ 24/Jan/19 ] |
|
Author: {'username': 'vessy-mongodb', 'email': 'vesselina.ratcheva@10gen.com', 'name': 'Vesselina Ratcheva'}Message: |
| Comment by Samyukta Lanka [ 31/Oct/18 ] |
|
One potential solution is to ignore arbiters here when determining if enough nodes have replicated the write. |