[SERVER-45692] Ensure all internal inter-node commands which accept read/write concern explicitly specify it Created: 22/Jan/20  Updated: 29/Oct/23  Resolved: 25/Mar/20

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: None
Fix Version/s: 4.4.0-rc0, 4.3.6, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Kevin Pulo Assignee: Kevin Pulo
Resolution: Fixed Votes: 0
Labels: PM-900-Fallout
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-44539 Re-enable "missing RWC" logging Backlog
Problem/Incident
causes SERVER-47043 Read concern set to default when it s... Closed
Related
related to SERVER-53293 DBClientCursor not properly applying ... Closed
related to SERVER-62664 Use explicit read concern in Sessions... Closed
is related to SERVER-46249 Disallow createIndexes and createColl... Closed
is related to SERVER-46638 Investigate RWC default implications ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Sharding 2020-01-27, Sharding 2020-02-10, Sharding 2020-02-24, Sharding 2020-03-09
Participants:
Linked BF Score: 46

 Description   

The approach will be to add a uassert/fassert (or possibly fassert when enableTestCommands, else uassert) if mongod receives an incoming command, on an internalClient connection, that accepts RWC, but doesn't have it explicitly specified. The reason is that without it being explicitly specified, this internal command would have the user-defined CWRWC default applied to it, which may not give the desired semantics. This will also ensure that future inter-node communication includes explicit RWC as appropriate (because otherwise tests will fail).

The original approach was to avoid applying RWC defaults to non-replicated namespaces. However, doing this reliably at command dispatch time would require a deep parsing and understanding of all the namespaces that will be touched by the command, which is currently not available and would be hard (perhaps impossible) to add.


Original summary: RWC defaults shouldn't be applied to namespaces that aren't replicated

Original description:
Currently RWC defaults are applied to non-replicated namespaces like the oplog and system.profile, even though this doesn't make sense. In the case of default readConcern majority, this can inhibit initialisation of the oplog fetcher on secondaries during startup, thereby leading to a distributed deadlock where all reads using the default readConcern of majority will not be able to complete (on primaries and secondaries).



 Comments   
Comment by Githook User [ 26/Mar/20 ]

Author:

{'name': 'Kevin Pulo', 'username': 'devkev', 'email': 'kevin.pulo@mongodb.com'}

Message: SERVER-45692 add explicit RWC to inter-node commands (even if merely kImplicitDefault)

(cherry picked from commit 747ff353cbc819d032fa727d4bd7ffad16ea0437)

SERVER-45692 uassert if mongod receives a command on an internalClient connection, which should have explicit RWC, but doesn't

(cherry picked from commit 165b01c12f72840a25f16e1fe9b9b4df4bffa3ea)

SERVER-47043 Read concern set to default when it should be local

(cherry picked from commit ad5a5fd53f50a0290338ea701489f3c18f1fa308)
Branch: v4.4
https://github.com/mongodb/mongo/commit/8cf9dacffcec2666fbfbbcb3f8ef4a4402db4485

Comment by Githook User [ 24/Mar/20 ]

Author:

{'email': 'kaloian.manassiev@mongodb.com', 'name': 'Kaloian Manassiev', 'username': 'kaloianm'}

Message: Revert "SERVER-45692 add explicit RWC to inter-node commands (even if merely kImplicitDefault)"

This reverts commit 81c6113198d2f5debf3da38a42bf61d7a079de2e.

We discovered that this commit introduces SERVER-47043, so it can't be
backported without it. Leaving all the backports for after 4.4.0-RC0 is
cut.
Branch: v4.4
https://github.com/mongodb/mongo/commit/aef2f4f5c62fe07842bbc861dddcb0ad2f9a9e16

Comment by Githook User [ 24/Mar/20 ]

Author:

{'email': 'kevin.pulo@mongodb.com', 'name': 'Kevin Pulo', 'username': 'devkev'}

Message: SERVER-45692 add explicit RWC to inter-node commands (even if merely kImplicitDefault)

(cherry picked from commit 747ff353cbc819d032fa727d4bd7ffad16ea0437)
Branch: v4.4
https://github.com/mongodb/mongo/commit/81c6113198d2f5debf3da38a42bf61d7a079de2e

Comment by Githook User [ 05/Mar/20 ]

Author:

{'name': 'Kevin Pulo', 'username': 'devkev', 'email': 'kevin.pulo@mongodb.com'}

Message: SERVER-45692 uassert if mongod receives a command on an internalClient connection, which should have explicit RWC, but doesn't
Branch: master
https://github.com/mongodb/mongo/commit/165b01c12f72840a25f16e1fe9b9b4df4bffa3ea

Comment by Githook User [ 05/Mar/20 ]

Author:

{'username': 'devkev', 'name': 'Kevin Pulo', 'email': 'kevin.pulo@mongodb.com'}

Message: SERVER-45692 add explicit RWC to inter-node commands (even if merely kImplicitDefault)
Branch: master
https://github.com/mongodb/mongo/commit/747ff353cbc819d032fa727d4bd7ffad16ea0437

Comment by Githook User [ 28/Jan/20 ]

Author:

{'email': 'kevin.pulo@mongodb.com', 'username': 'devkev', 'name': 'Kevin Pulo'}

Message: SERVER-45692 make SyncSourceResolver pass readConcern local when querying remote oplogs
Branch: master
https://github.com/mongodb/mongo/commit/823a131ca93354c42b767767269f5a9e70691e78

Comment by Judah Schvimer [ 24/Jan/20 ]

I think if the "db" of a command is "local" we could ignore the CWRWC defaults. My understanding is that that would never look at replicated data, and so could be a best effort attempt, in addition to explicitly specifying a read concern when we know we need to (e.g. oplog fetching).

Further, I agree that we should have explicit tests for rollback, sync source selection, and initial sync with CRWRC. Even just targeted integration tests would suffice.

Comment by Kaloian Manassiev [ 24/Jan/20 ]

I don't think SERVER-44539 would have caught this in the replica set case, but I do agree that it shows it is important, but I have some concerns that it might be an ongoing effort before we can truly enable invariants (especially with backwards compatibility implications in mind).

CC jack.mulrow
 

Comment by Kevin Pulo [ 24/Jan/20 ]

kaloian.manassiev, this discussion reminded me that one of the purposes of SERVER-44539 was supposed to be flushing out the locations of these internal commands. So maybe that ticket is more pressing after all?

In the meantime I will use this ticket to fix this case that we know about (and which is causing test failures), by adding the explicit readConcern. We will likely discover more down the track, but we should at least fix what we know about.

In terms of testing replication (and sharding, for that matter) itself in the presence of custom CWRWC defaults, there is no coverage of that at this stage. There are the cwrwc passthrough suites, but they only run jsCore (ie. they are more focussed on user operations). We can look into defining cwrwc passthroughs of the replsets/sharding suites, but I'm concerned about the default RWC getting in the way of the tests themselves (ie. the tests might have unstated assumptions that RC is local and WC is w:1).

Is it possible to ignore CWRWC for oplog queries in general

This has much the same problem as checking if it's replicated. ie. the command dispatch code could call NamespaceString::isOplog() instead of NamespaceString::isReplicated(), but

  1. I don't know for sure that the nss is fully-formed at the time (eg. it might have just the db, not the db and collection), and
  2. it won't help if the command somehow reads from the oplog without specifying a namespace of "local.oplog.rs" at the top level (eg. by using $lookup).
Comment by Samyukta Lanka [ 23/Jan/20 ]

kaloian.manassiev, yes I'm asking about coverage of replication paths that do remote queries with CWRWC set to something other than local. I think with this ticket we should add some coverage of the places we do remote queries on the oplog, but I wanted to check if we had something for other remote queries, like collection cloning during initial sync, to make sure there aren't edge cases that could stall.

Comment by Kaloian Manassiev [ 23/Jan/20 ]

samy.lanka, just for background - the CWRWC feature is about automatically applying read/write concerns to incoming requests and happens very early, in the entry point. Functionally it is no different from a user including read/write concern with an operation. Because of this, the cwrwc_rc_majority_passthrough suite is no different in terms of what it covers from read_concern_majority_passthrough.

Kev can answer more about the targeted coverage we have for CWRWC if you want, but given the above explanation, I think your questions about coverage of rollback or initial sync are unrelated to the feature. Is that correct?

Unless you are asking about targeted coverage of the replication paths which do remote queries, with CWRWC set to something other than local, in order to ensure that stalls like the one in BF-15941 do not occur, which would be ideal

Comment by Samyukta Lanka [ 22/Jan/20 ]

I agree with Judah that we should look into all the known places we do oplog reads to make sure the solution works for all those cases. One question I have is how much test coverage do we have for CWRWC right now? I don't think we're getting much test coverage of rollbacks or of a meaningful initial sync (the one at startup only replicates a couple of collections), which I think would help us be more confident in the solution for this ticket. Perhaps adding a couple integration tests with this fix could help with that.

Also, is my understanding correct that for CWRWC, we're doing collection cloning reads for initial sync with read concern?

Comment by Judah Schvimer [ 22/Jan/20 ]

That's reasonable. I can't think of any reads one node would do on another of unreplicated data other than the oplog. Is it possible to ignore CWRWC for oplog queries in general, rather than changing each individually as we see them and hoping we remember to give future oplog queries an explicit RWC?

Comment by Kaloian Manassiev [ 22/Jan/20 ]

Does the CWRWC get applied when a node reads its own unreplicated state in DBDirectClient?

No, calls under DBDirectClient are excluded.

Ignoring it for unreplicated collections sounds good in theory, however in practice, determining that an operation only accessed unreplicated data is difficult (or at least we don't have means to do that). So the only option we have is to "approximate" it, by parsing the operation a-priori and checking whether there's "local" somewhere in it. I think this is a very error-prone approach and also causes layer violation. If we had "something" on the operation context which gets set if the operation accessed only unreplicated data, which we can check at the end of the command, then that's a different question, but I'd rather not build something like this now, given that we need to do the same for read concern as well

Comment by Judah Schvimer [ 22/Jan/20 ]

My personal inclination would be to exempt unreplicated collections from CWRWC. Does the CWRWC get applied when a node reads its own unreplicated state in DBDirectClient? I could see that being a problem for internal replication state if it was reading stale data or blocking for some period of time. I also agree with Kev's POV that read and write concern don't really make sense for unreplicated data. We can't error because that's backwards breaking, but we certainly could ignore it. Fixing this across the board would also prevent us from adding new bugs like this in the future.

I understand Kal's concern though, and if we don't see any other similar problems creeping up, and think the oplog fetcher is the only problematic read, then this is a fine solution. There are multiple places we do oplog reads though (like in rollback and initial sync and the sync source resolver and ordinary oplog fetching), so we should explicitly audit that those are all being fixed.

Comment by Kaloian Manassiev [ 22/Jan/20 ]

As we spoke this morning, I believe that the proposed approach is wrong, because it requires the commands processing code to parse the command and extract the namespace, which is not always possible (for example aggregation can touch more than one namespace and we have no infrastructure to extract that). Instead, I would like to us to do adjust the oplog fetcher to explicitly pass RC local.

Let me know if you guys think this could be problematic in any way, but I believe that we discussed this during the design review and it didn't raise any eyebrows.

CC samy.lanka

Generated at Thu Feb 08 05:09:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.