[SERVER-35289] $currentOp takes database locks when causal consistency is requested Created: 30/May/18 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eric Milkie | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Execution
|
| Operating System: | ALL |
| Participants: |
| Description |
|
The $currentOp aggregation stage needs to take no database locks, as the currentOp command does today. |
| Comments |
| Comment by Tess Avitabile (Inactive) [ 01/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
No, this is not blocking other work. It fell out of unrelated testing. However, it seems more important now that we can have a $currentOp blocked on a dropDatabase that is blocked on an open transaction. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bernard Gorman [ 01/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
tess.avitabile milkie behackett: A couple of followup questions to better determine how to prioritise and fix this:
Edit: it looks as though we currently return a generic InvalidOptions error for commands where allowsAfterClusterTime() == false. Would it make sense to return a new code for cases like $currentOp - something like "RetryWithoutAfterClusterTime" - so that drivers could tell that it is safe to resubmit a particular command without the causal metadata? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bernard Gorman [ 31/May/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Is that because the $currentOp aggregation stage doesn't end up consulting the allowsAfterClusterTime() function in current_op_common.h? The changes from max.hirschhorn, this looks like the right solution to me. A $currentOp aggregation won't consult the currentOp command's allowsAfterClusterTime() since that code is entirely separate from the aggregation's PipelineCommand, but we can add an override of the method in PipelineCommand (which currently defers to the default implementation in BasicCommand) to return false in the case that the aggregation starts with $currentOp. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Max Hirschhorn [ 31/May/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Is that because the $currentOp aggregation stage doesn't end up consulting the allowsAfterClusterTime() function in current_op_common.h? The changes from
We certainly could have the mongo shell and drivers not inject "afterClusterTime" for aggregations involving the $currentOp stage; however, this may run contra to the Driver team's goal of not special casing the behavior for the "aggregate" command by inspecting the pipeline.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 31/May/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We might consider having the shell detect when the user is attempting to run an aggregation with the $currentOp stage and then not send the causal metadata for that command. Or it could return an error. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tess Avitabile (Inactive) [ 31/May/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This appears to happen when causal consistency is enabled. For example, I ran the following function under suite causally_consistent_jscore_txns_passthrough:
I saw that the aggregation did take locks:
The currentOp command did not take locks:
The aggregation did not take locks when causal consistency was not enabled. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 31/May/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I defer to tess.avitabile; she was experiencing the locking behavior and can help diagnose. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bernard Gorman [ 31/May/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
milkie - does it? Part of the work during the initial $currentOp implementation was to ensure that it avoided taking any locks (see here and the accompanying code review). There's a test here which is designed to confirm that $currentOp is able to retrieve results while the server is locked. Also: if the $currentOp agg stage takes locks, then I'd expect that the currentOp command also does? The latter has been internally implemented as a $currentOp aggregation ever since the agg stage was added. Where are you seeing these locks being taken? I don't think it's anywhere in the aggregation code itself - is it somewhere in the command preamble? |