[SERVER-62246] writeConcern all Created: 23/Dec/21 Updated: 07/Apr/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Alan Zheng |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Sprint: | Replication 2022-01-24, Replication 2022-02-07, Repl 2022-02-21, Repl 2022-03-07, Repl 2022-03-21, Repl 2022-04-04 |
| Participants: |
| Description |
|
If I want write to be acknowledged when all voting nodes have confirmed a write, I have to manually specify the number of nodes in my writeConcern (e.g. "w: 3" in a 3-node set). It would be helpful to support a "w: all" writeConcern that supports this behavior independent of replica set topology. We already do something similar with the "votingMembers" commitQuorum setting, so I would imagine we can do something similar for writeConcern as well. |
| Comments |
| Comment by Judah Schvimer [ 06/Apr/22 ] | ||||||||||||||||||||||||||||
|
w:all wouldn't prevent a node from falling too far behind unless the high volume write clients used w:all to throttle themselves. Generally if one node is falling very far behind it's underprovisioned, overloaded, or degraded in some way. In those cases you probably don't want to slow down your writes, you'd likely prefer to rebalance any imbalanced read load, or do maintenance to fix any degradation or imbalanced provisioning within the set. | ||||||||||||||||||||||||||||
| Comment by Louis Williams [ 06/Apr/22 ] | ||||||||||||||||||||||||||||
|
milkie, good point! To provide more background, this came up while developing dbCheck. It's a potentially very expensive operation, and we wanted a way to prevent any secondary from falling too far behind. The dbCheck operation is not required for availability, so if a node crashes, it would be reasonable to stop until the crashed node re-joins the set. I expect users of w:all would be willing to make the same tradeoff. | ||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 06/Apr/22 ] | ||||||||||||||||||||||||||||
|
It doesn't sound like having all writes use w:"all" would be a viable solution to the lack of majority-write availablity when a node goes down, since that's exactly the main disadvantage of w:"all" – your writes stop working as soon as any [voting] node fails. | ||||||||||||||||||||||||||||
| Comment by Louis Williams [ 06/Apr/22 ] | ||||||||||||||||||||||||||||
|
The original motivation was more of a stability/availability one: to ensure that no secondaries lag significantly behind the majority of the set. In a 3-node set, if two nodes are significantly ahead of the third, a failure of one of my good nodes could result in an extended period with a lack of majority write availability. | ||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 05/Apr/22 ] | ||||||||||||||||||||||||||||
|
If we're going to introduce an "all" write concern, I think we should be very thoughtful about its intended meaning. Let's suppose as proposed in the description that it means "all voters". So, why do we want to wait for all voters to confirm a write? The only reason I know of so far is to ensure that reads sent to secondaries after the write is confirmed will definitely see the result of the write at "majority" or "local" read concern. With some extra work to ensure read-preference primary always ensures read-your-writes behavior for those read concerns, if we only consider voting nodes we could build this table:
So if there are only voting nodes, or there is a "any voter" read preference, a write concern of "all" is a way to wait on writes to ensure that all such readers will see the result of your writes. Is that the purpose of this read preference? | ||||||||||||||||||||||||||||
| Comment by Judah Schvimer [ 28/Mar/22 ] | ||||||||||||||||||||||||||||
|
We also need to decide if w:all should wait for journaling on all nodes or just the primary, and the behavior with "writeConcernMajorityJournalDefault". | ||||||||||||||||||||||||||||
| Comment by Elizabeth Roytburd [ 04/Jan/22 ] | ||||||||||||||||||||||||||||
|
alan.zheng to look at this, thank you! | ||||||||||||||||||||||||||||
| Comment by Judah Schvimer [ 23/Dec/21 ] | ||||||||||||||||||||||||||||
|
One tricky part here will be defining and ensuring the semantics of w:all in the face of concurrent reconfigs. We also have to decide if w:all refers only to voting nodes or not. IIRC, non-majority write concerns count non-voting nodes. I suspect w:all should be "all nodes" not "all voting nodes", but since commitQuorum wanted "all voting nodes" we may want to surface both options. Users also may expect w:all to be "stronger" than w:majority. We should consider if w:all requires its optime to be majority committed in addition to being on all nodes. One last consideration will be, what do we do if a user has already defined a custom write concern with the same name? |