[SERVER-78035] Cluster wide read/write concern are incorrectly applied for reads/writes to non-replicated namespaces Created: 13/Jun/23  Updated: 21/Jun/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Kaitlin Mahar Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File cw_rc_wc_unreplicated_ns.js    
Issue Links:
Related
Assigned Teams:
Replication
Operating System: ALL
Participants:

 Description   

I think this issue likely dates back to 4.4 when cluster-wide read/write concern support was added. See SERVER-45692 for some previous discussion on this. Back then, we opted not to try to avoid applying the RC/WC in this case due to difficulty knowing which namespaces a command would touch. Instead, we opted to require an explicit RC/WC for all internal operations. However, this leaves external commands interacting with non-replicated collections vulnerable to this bug.

The impact of this:

  • For writes, I don't believe there is an impact, other than that we confusingly emit a debug log "applying default writeConcern" for the operation. ServiceEntryPointMongod::Hooks::waitForWriteConcern ultimately checks if the namespace is unreplicated and no-ops if so.
  • For reads, I think in most cases there is also little impact besides confusing logs. If a cluster-wide majority RC is set, so long as there is some majority committed snapshot available already, we will successfully "wait" for read concern. However, in certain situations (such as if no majority committed snapshot exists yet and a majority of nodes are down) we could hang indefinitely waiting for one to become available. The only other allowed cluster-wide read concern levels are "available" and "local", and if I understand those correctly in neither case would we actually wait for anything.

The ideal behavior seems like it should be:

  • If no read/write concern is supplied for the operation, we do not apply the cluster-wide default.
  • If a read/write concern is supplied for the operation, we ignore it (and also do not apply the cluster-wide default). Arguably the most correct behavior could be to error. However, drivers support setting client-wide read and write concerns that are applied to each operation (and drivers do not know or check whether a namespace is replicated before applying the RC/WC), and so I think we may break applications relying on us ignoring/effectively ignoring the value when they upgrade, if we were to start erroring.

There is a straightforward workaround, which is to explicitly specify the correct read/write concern for any operations that are failing due to the incorrectly applied default. When an explicit read/write concern is specified, we do not apply the default.



 Comments   
Comment by Kaitlin Mahar [ 13/Jun/23 ]

Attached a basic test whose log output shows this behavior.

Generated at Thu Feb 08 06:37:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.