[DRIVERS-2141] Prohibit retryable writes for write commands targeting unreplicated local collection Created: 19/Jul/19  Updated: 31/Mar/22

Status: Backlog
Project: Drivers
Component/s: Retryability
Fix Version/s: None

Type: Spec Change Priority: Major - P3
Reporter: Jeremy Mikola Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-42317 Investigate - Prohibit retryable writ... Closed
Related
related to PHPC-1412 Fix test failures when retryable writ... Closed
Driver Changes: Needed

 Description   

Attempting to write to the local database results in the following server error from db/repl/oplog.cpp:

retryable writes is not supported for unreplicated ns: %s

I expect this error relates to this notable restriction in the MongoDB manual. While the restriction refers to multi-document transactions, I assume it overlaps with the common tooling used for retryable writes (e.g. txnNumber).

I caught this error when upgrading PHPC to a version of libmongoc that had enabled retryable writes by default. The particular test was replicaset/manager-selectserver-001.phpt, which attempts to insert some documents into a local.example collection. I don't recall any particular reason it uses the the local database, and this could easily be changed to use a replicated database, but it did highlight a potential conflict I think may have been overlooked.

It's possible this conflict could introduce unexpected errors in applications that write to collections in the local database. This is certainly an edge case, but the same could be said for MMAPv1 users in SPEC-1345.

I can think of two approaches off the top of my head:

  • Document this conflict in drivers. We can also suggest that the DOCS team update their notable restriction.
  • Amend the spec to prohibit retryable writes for write commands targeting the local database. This should be possible without inspecting the command document, since we could inspect the global $db argument (for OP_MSG). This would require code changes in drivers.

Note: I'm only referring to "local" in this ticket as it's the only unreplicated collection that I'm aware of. If others are possible, that may complicate the suggestion to implement checks.



 Comments   
Comment by Jeremy Mikola [ 05/Sep/19 ]

Thanks kaloian.manassiev, that clears out any outstanding questions about server behavior. I'm moving this back to Open to shuffle it back to the drivers backlog.

Comment by Kaloian Manassiev [ 16/Aug/19 ]

jmikola, apologies for the delayed reply here, somehow it slipped out of my attention.

First, your observation is correct that both transactions and retryable writes to non-replicated collections are prohibited and the check that you linked uses the presence of txnNumber to cover both. The reason to block retryable writes to non-replicated collections is because we use the oplog in order to provide retryability.

In addition to local, there is another unreplicated collection (config.transactions), but that one is not supposed to be written by customers anyways, so I don't think it is worth mentioning in the documents.

Your proposal to update the drivers spec to indicate that retryable writes must not be perform against local seems the most prudent thing to do (and I can't think of anything else we can do anyways).

Is there anything else here that you needed from the server team?

Comment by Jeremy Mikola [ 22/Jul/19 ]

Jeremy, I have created SERVER-42317 to triage the request in the Sharding board

Thanks, ratika.gandhi. Seems like a better option to track both tasks independently, as the drivers team will still need to triage this SPEC ticket.

Comment by Ratika Gandhi [ 22/Jul/19 ]

Jeremy, I have created SERVER-42317 to triage the request in the Sharding board. Hope this is okay. 

 

Comment by Esha Maharishi (Inactive) [ 22/Jul/19 ]

jmikola, if you don't mind, I'm assigning this to the sharding backlog so that it shows up at our triage meeting.

Comment by Esha Maharishi (Inactive) [ 22/Jul/19 ]

jmikola, there are other internal collections that are not replicated in the normal way, like config.transactions.

Comment by Jeremy Mikola [ 22/Jul/19 ]

esha.maharishi: While you're asking, perhaps you can clarify the following outstanding question:

I'm only referring to "local" in this ticket as it's the only unreplicated collection that I'm aware of. If others are possible, that may complicate the suggestion to implement checks.

If "local" really is the only always-unreplicated database to worry about, I think blacklisting it may be viable; however, I wouldn't want to consider that if it's possible for other databases to be unreplicated. AFAIK, users can't configure a database to be unreplicated, so I may just need to confirm that "local" is the only one used by the server internally.

Comment by Esha Maharishi (Inactive) [ 22/Jul/19 ]

jmikola, sharding is the right team, but I'm not sure what we specifically want to do - both of the options you suggested seem plausible to me. I will bring it up to the sharding team in #server-sharding.

Comment by Jeremy Mikola [ 19/Jul/19 ]

esha.maharishi: I asked in #server and was told that the sharding team is responsible for the mongo client's driver-like behavior. In this case, determining if a txnNumber should be added to an outgoing write command. Is this something you can chime in on?

Comment by David Golden [ 19/Jul/19 ]

I think we need to skip retrying writes to "local".

Generated at Thu Feb 08 08:24:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.