[DRIVERS-2141] Prohibit retryable writes for write commands targeting unreplicated local collection Created: 19/Jul/19 Updated: 31/Mar/22 |
|
| Status: | Backlog |
| Project: | Drivers |
| Component/s: | Retryability |
| Fix Version/s: | None |
| Type: | Spec Change | Priority: | Major - P3 |
| Reporter: | Jeremy Mikola | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Driver Changes: | Needed | ||||||||||||||||
| Description |
|
Attempting to write to the local database results in the following server error from db/repl/oplog.cpp:
I expect this error relates to this notable restriction in the MongoDB manual. While the restriction refers to multi-document transactions, I assume it overlaps with the common tooling used for retryable writes (e.g. txnNumber). I caught this error when upgrading PHPC to a version of libmongoc that had enabled retryable writes by default. The particular test was replicaset/manager-selectserver-001.phpt, which attempts to insert some documents into a local.example collection. I don't recall any particular reason it uses the the local database, and this could easily be changed to use a replicated database, but it did highlight a potential conflict I think may have been overlooked. It's possible this conflict could introduce unexpected errors in applications that write to collections in the local database. This is certainly an edge case, but the same could be said for MMAPv1 users in SPEC-1345. I can think of two approaches off the top of my head:
Note: I'm only referring to "local" in this ticket as it's the only unreplicated collection that I'm aware of. If others are possible, that may complicate the suggestion to implement checks. |
| Comments |
| Comment by Jeremy Mikola [ 05/Sep/19 ] |
|
Thanks kaloian.manassiev, that clears out any outstanding questions about server behavior. I'm moving this back to Open to shuffle it back to the drivers backlog. |
| Comment by Kaloian Manassiev [ 16/Aug/19 ] |
|
jmikola, apologies for the delayed reply here, somehow it slipped out of my attention. First, your observation is correct that both transactions and retryable writes to non-replicated collections are prohibited and the check that you linked uses the presence of txnNumber to cover both. The reason to block retryable writes to non-replicated collections is because we use the oplog in order to provide retryability. In addition to local, there is another unreplicated collection (config.transactions), but that one is not supposed to be written by customers anyways, so I don't think it is worth mentioning in the documents. Your proposal to update the drivers spec to indicate that retryable writes must not be perform against local seems the most prudent thing to do (and I can't think of anything else we can do anyways). Is there anything else here that you needed from the server team? |
| Comment by Jeremy Mikola [ 22/Jul/19 ] |
Thanks, ratika.gandhi. Seems like a better option to track both tasks independently, as the drivers team will still need to triage this SPEC ticket. |
| Comment by Ratika Gandhi [ 22/Jul/19 ] |
|
Jeremy, I have created
|
| Comment by Esha Maharishi (Inactive) [ 22/Jul/19 ] |
|
jmikola, if you don't mind, I'm assigning this to the sharding backlog so that it shows up at our triage meeting. |
| Comment by Esha Maharishi (Inactive) [ 22/Jul/19 ] |
|
jmikola, there are other internal collections that are not replicated in the normal way, like config.transactions. |
| Comment by Jeremy Mikola [ 22/Jul/19 ] |
|
esha.maharishi: While you're asking, perhaps you can clarify the following outstanding question:
If "local" really is the only always-unreplicated database to worry about, I think blacklisting it may be viable; however, I wouldn't want to consider that if it's possible for other databases to be unreplicated. AFAIK, users can't configure a database to be unreplicated, so I may just need to confirm that "local" is the only one used by the server internally. |
| Comment by Esha Maharishi (Inactive) [ 22/Jul/19 ] |
|
jmikola, sharding is the right team, but I'm not sure what we specifically want to do - both of the options you suggested seem plausible to me. I will bring it up to the sharding team in #server-sharding. |
| Comment by Jeremy Mikola [ 19/Jul/19 ] |
|
esha.maharishi: I asked in #server and was told that the sharding team is responsible for the mongo client's driver-like behavior. In this case, determining if a txnNumber should be added to an outgoing write command. Is this something you can chime in on? |
| Comment by David Golden [ 19/Jul/19 ] |
|
I think we need to skip retrying writes to "local". |