[SERVER-31613] Commands that don't currently take any LockManager locks must not accept afterClusterTime Created: 18/Oct/17  Updated: 30/Oct/23  Resolved: 27/Oct/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.6.0-rc2

Type: Question Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Misha Tyulenev
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to CDRIVER-2418 mongoc_cmd_parts_assemble() may add r... Closed
related to SERVER-31743 Identify all commands that can not ac... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2017-11-13
Participants:

 Description   

Commands such as currentOp and ping are explicitly intended not to take any LockManager locks; however, they may end up acquiring LockManager locks as part of acquiring a MODE_IS lock on the oplog when afterClusterTime is specified in the readConcern object. I think it would be worth considering have the server reject these commands when an afterClusterTime is specified to preserve this intent.



 Comments   
Comment by Githook User [ 27/Oct/17 ]

Author:

{'email': 'misha@mongodb.com', 'name': 'Misha Tyulenev', 'username': 'mikety'}

Message: SERVER-31613 disallow afterClusterTime on some commands
Branch: master
https://github.com/mongodb/mongo/commit/54f044558c0845adcc6b51c2351e8382a3e9bbd7

Comment by Andy Schwerin [ 19/Oct/17 ]

Ok. Thanks for the explanation, max.hirschhorn. I'm convinced.

Comment by Misha Tyulenev [ 18/Oct/17 ]

I think blacklisting the commands that can not have readConcern on the server is the right way to address the issue.

Comment by Max Hirschhorn [ 18/Oct/17 ]

I'm amenable to forbidding readConcern on a handful of operations, but why would you even send one? If the argument is "drivers might just send one on every command", then rejecting readConcern is just going to push the problem to the drivers.

Misha's current definition of commands that should accept an afterClusterTime field are all commands except for getMore, so I think the answer is "yes, drivers will send one on (nearly) every command". In my changes for SERVER-31296 where I have a patch to change the mongo shell to inject afterClusterTime for all commands except for getMore, tests such as jstests/core/evald.js hang because they expect to be able to run the "currentOp" command to find an "eval" command holding the global X lock in order to test killing it. While we could work around this issue specificially within those tests, the fact that the server allowed this to happened suggested to me that the current definition of commands that should accept an afterClusterTime field is not fully thought out.

If you ignore the readConcern silently on some commands, I'm concerned that we'll screw up and start ignoring it on a command we shouldn't, and only find out when we've returned wrong results to a savvy client application.

No one is asking that the readConcern object be ignored. If Driver's are going to need to maintain a whitelist or blacklist of commands, then the onus is already on them. I'd rather the server reject the command so that (1) it is unambiguous the expectations of whether a command has a happens after relationship or may be concurrent, and (2) drivers will know if they messed up because they'll get back an error from the server.

If the drivers are smart enough not to send a readConcern on commands that don't need to be kept causally consistent with previous writes, then what's the issue?

I'm not smart enough to know which commands not to send a readConcern and given the lack of details in the Driver's specification for read and write concern, I wouldn't expect any drivers to be smart enough either.

Comment by Andy Schwerin [ 18/Oct/17 ]

I'm amenable to forbidding readConcern on a handful of operations, but why would you even send one? If the argument is "drivers might just send one on every command", then rejecting readConcern is just going to push the problem to the drivers. If you ignore the readConcern silently on some commands, I'm concerned that we'll screw up and start ignoring it on a command we shouldn't, and only find out when we've returned wrong results to a savvy client application. If the drivers are smart enough not to send a readConcern on commands that don't need to be kept causally consistent with previous writes, then what's the issue?

Comment by Kaloian Manassiev [ 18/Oct/17 ]

Yes, this would be a major support disruption. This would also make it really difficult to investigate deadlocks.

We should also add serverStatus to this list.

cc misha.tyulenev

Comment by Eric Milkie [ 18/Oct/17 ]

Wouldn't it be really disruptive for the currentOp command to acquire a MODE_IS lock? If you had a server where a MODE_X Global lock was enqueued but not yet granted, which could happen during stepdown, you wouldn't be able to run currentOp, it would just hang.

Generated at Thu Feb 08 04:27:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.