[SERVER-37750] Optimized $sample stage does not yield Created: 25/Oct/18 Updated: 29/Oct/23 Resolved: 15/Nov/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 3.4.0, 3.6.0, 4.0.0 |
| Fix Version/s: | 3.4.19, 3.6.10, 4.0.5, 4.1.6 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Charlie Swanson | Assignee: | Bernard Gorman |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v4.0, v3.6, v3.4
|
||||||||||||||||||||||||
| Sprint: | Query 2018-11-19 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||
| Linked BF Score: | 50 | ||||||||||||||||||||||||
| Description |
|
In the optimized $sample path where we use a pseudo-random cursor (docs), we need to insert a ShardFilterStage to ensure the $sample stage only returns documents which belong to this shard (i.e. it needs to filter out orphans and in-progress migrations). To construct this ShardFilterStage we need to consult the CollectionShardingState to get the metadata for the collection, including the shard key. Accessing this state needs to be done under a lock, so we take a lock to do so:
This lock is actually unnecessary, because the caller of 'createRandomCursorExecutor' already holds a lock on the collection - we must hold a lock during the entirety of PipelineD::createCursorSource, since we are interacting with a raw Collection*. Taking this extra lock may seem benign, but it actually has an unintended side-effect: double locking prevents yielding:
Because of this, the PlanExecutor powering the $sample stage will not yield, despite passing the explicit YIELD_AUTO policy (shown above). This is obviously surprising. I'm not entirely certain about the motivation for this behavior, but I believe this behavior was engineered to prevent surprise yielding of locks from operations which needed to execute some sort of sub-query. For example, a replication command or internal thread may need to find the most recent oplog entry, and would not want that operation to yield the locks on the oplog, despite that being the default behavior. It looks like this change can be traced back to this commit, which (I assume defensively - just to be sure) added the AutoGetCollection lock as part of an unrelated refactor. That extra lock has been there ever since, meaning this bug affects versions back to 3.4. |
| Comments |
| Comment by Githook User [ 22/Nov/18 ] |
|
Author: {'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com', 'username': 'gormanb'}Message: |
| Comment by Githook User [ 21/Nov/18 ] |
|
Author: {'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com', 'username': 'gormanb'}Message: |
| Comment by Githook User [ 21/Nov/18 ] |
|
Author: {'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com', 'username': 'gormanb'}Message: (cherry picked from commit 4177d6d22ab3329a8607bf80a62aa03d4fb2c528) |
| Comment by Githook User [ 20/Nov/18 ] |
|
Author: {'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com', 'username': 'gormanb'}Message: (cherry picked from commit 4177d6d22ab3329a8607bf80a62aa03d4fb2c528) |
| Comment by Githook User [ 20/Nov/18 ] |
|
Author: {'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com', 'username': 'gormanb'}Message: (cherry picked from commit 4177d6d22ab3329a8607bf80a62aa03d4fb2c528) |
| Comment by Githook User [ 15/Nov/18 ] |
|
Author: {'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com', 'username': 'gormanb'}Message: |