[SERVER-15254] replica set members should be able to replicate off members that don't build indexes Created: 13/Sep/14 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Zardosht Kasheff | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Replication
|
| Participants: |
| Description |
|
Currently, if member 'A' does not build indexes, then other members that do build indexes cannot replicate oplog data off of A. Here is why this is problematic. Suppose member 'A' finds itself to be further ahead than all other members, because it was the only member to replicate data some data off the primary before the primary disappeared. Because other members cannot sync from A, other members cannot catch up to A. Also, because A is ahead of everyone else, A will veto every possible election of a new primary. You are stuck with a situation where no primary can be elected. Perhaps Member::syncable() should distinguish cases where we wish to replicate oplog data or do an initial sync. It makes sense for members that don't build indexes to be ineligible for being the source of an initial sync. However, for the reasons above, they should be allowed to be the source of oplog data during normal replication |
| Comments |
| Comment by Charlie Page [ 18/Sep/14 ] |
|
The root of the issue is that a node which cannot be elected can acknowledge a write concern. Only electable, data bearing nodes, should be part of a write concern as it is a necessary condition that if a node acknowledges a write it can fully represent that write in a consensus operation for the write concern to be meaningful. w:majority can still fail if the majority of the partition only contains unelected servers which are most current. A single buildIndexes = false node can prevent a primary/ force rollbacks (depending on what decision is made) if another electable node is not equally or greater current. |
| Comment by Zardosht Kasheff [ 15/Sep/14 ] |
|
Hello Eric, But I think removing veto powers is problematic. Yes, majority write concern would still work, but practically any other replica set write concern that users may want will have problems. Suppose a write is written with REPLICA_SAFE. The user expects that as long as new elections involve all secondaries (and that only the old primary does not participate for some strange reason), the write survives. Without veto power, the only secondary to have acknowledged a write may not be able to stop an election that does not contain the write. A similar example is having three data centers with three members each, and using a write concern that states "make sure the write makes it to two out of three data centers". Assuming two data centers (6 members) participate in an election, and only one member has the write, an election may rollback the write. This issue is related to |
| Comment by Eric Milkie [ 15/Sep/14 ] |
|
Hi Zardosht. |
| Comment by Zardosht Kasheff [ 13/Sep/14 ] |
|
A note I forgot to add. I did not read the code to see how rollback would be affected, but there may be some subtlety there. |