[SERVER-22541] Aggregation plan executors should be owned by global cursor manager Created: 09/Feb/16  Updated: 07/Jun/21  Resolved: 16/Mar/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Querying
Affects Version/s: None
Fix Version/s: 3.5.5

Type: Bug Priority: Major - P3
Reporter: J Rassi Assignee: Charlie Swanson
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-26608 $out can attempt to upgrade locks if ... Closed
Duplicate
is duplicated by SERVER-24704 Stats for aggregation overcounted in Top Closed
Related
related to SERVER-17624 Interrupting aggregation operation ca... Closed
related to SERVER-27253 $lookup and $graphLookup do not incre... Closed
related to SERVER-27414 On mongos, cannot query view whose fi... Closed
related to SERVER-33959 CursorManager attempts to dispose of ... Closed
related to SERVER-26037 DocumentSourceCursor should report ap... Backlog
related to SERVER-55034 The profile command should not take S... Closed
is related to SERVER-25005 Execute queries in $lookup and $graph... Closed
is related to SERVER-23294 DocumentSourceMergeCursors should be ... Closed
is related to SERVER-29354 add ability to atomically replace vie... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Query 11 (03/14/16), Query 2016-11-21, Query 2016-12-12, Query 2017-02-13, Query 2017-03-27
Participants:
Linked BF Score: 15

 Description   

Plan executors that own a PipelineProxyStage do not hold a collection lock while they are being iterated. However, all executors that are eligible for kill notifications must hold a lock while they are being iterated, as concurrent access to PlanExecutor::_killReason for registered PlanExecutor objects is protected by the collection lock.

As a result, if a plan executor owning a PipelineProxyStage executor is killed during execution, then the executor's call to read _killReason from PlanExecutor::killed() will race with the killing thread's call to write _killReason from PlanExecutor::kill(). This is undefined behavior, and potentially can result in a server crash.

Aggregation cursors and plan executors should be owned by the global cursor manager, instead of by the collection's cursor manager. This correctly captures the fact that the lifetime of an aggregation cursor/executor is not tied to the lifetime of the collection, and this will prevent these cursors/executors from receiving kill notifications.

The underlying plan executor owned by DocumentSourceCursor should remain owned by the associated collection's cursor manager, and should be registered for receiving invalidations and kill notifications.



 Comments   
Comment by Githook User [ 15/Mar/17 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-22541 Manage aggregation cursors on global cursor manager.

Moves registration of aggregation cursors to the global cursor manager.
This simplifies the logic for acquiring locks and resolving view
namespaces within the getMore and killCursors commands.
Branch: master
https://github.com/mongodb/mongo/commit/584ca76de9ee66b3e11987e640f5317ae40975e4

Comment by Githook User [ 15/Mar/17 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-22541 Refactor RAII locking helpers.

Removes the class 'ScopedTransaction' and moves the responsibility of
abandoning the snapshot onto the GlobalLock class. Also renames the
AutoGetCollectionForRead class to AutoGetCollectionForReadCommand, and
adds a new AutoGetCollectionForRead class. Unlike
AutoGetCollectionForReadCommand, this new class will not update the
namespace on the CurrentOp object, nor will it add an entry to Top.
Branch: master
https://github.com/mongodb/mongo/commit/f05b9437fbdc53deecf55ed3c20e36af3d733953

Comment by Githook User [ 15/Mar/17 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-22541 Refactor RAII locking helpers.

Removes the class 'ScopedTransaction' and moves the responsibility of
abandoning the snapshot to the GlobalLock class. Also renames the
AutoGetCollectionForRead helper to AutoGetCollectionForReadCommand, and
adds a new AutoGetCollectionForRead class. Unlike
AutoGetCollectionForReadCommand, this new class will not update the
namespace on the CurrentOp object, nor will it add an entry to Top.
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/fcd6f14c86ef2ac0be92e4cc4ca8893dcf603b01

Comment by Charlie Swanson [ 15/Nov/16 ]

Bringing this into the current sprint since we believe it will provide a path forward for SERVER-25694.

Comment by J Rassi [ 22/Mar/16 ]

Bumping fix version back to "3.3 Desired".

This work was originally scheduled for 3.4 to fix a number of long-standing locking issues related to aggregation cursors. This work has many tech debt benefits in the area of query/agg integration, but it requires changing the namespace of aggregation cursor operations from "<db>.<collection>" to "<db>.$cmd.aggregate.<collection>". The merge cursors pipeline stage (currently run on the database primary shard for all sharded aggregations) currently only knows how to merge cursors on the original collection namespace, so it requires minor work to be namespace-agnostic for its input cursors (SERVER-23294). This minor work on the merge cursors pipeline stage could be easily backported to 3.2.5, but versions of the server prior to 3.2.5 will fail the merge operation when trying to merge aggregation cursors created on 3.4+ mongod nodes. As a result, this change would force 3.2.0-3.2.4 users to either take downtime on their sharded aggregations while upgrading to 3.4, or instead upgrade mongod through 3.2.5+ on their way to 3.4.

At the moment, we're considering this upgrade restriction to be prohibitive, so we're unscheduling this from 3.4. Once 3.4 is released with the fix for SERVER-23294, we'll be able to commit this work for 3.6 without any upgrade restrictions.

Generated at Thu Feb 08 04:00:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.