[SERVER-35967] $sample with explain(true) hangs Created: 05/Jul/18 Updated: 29/Oct/23 Resolved: 30/Jul/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 3.6.0 |
| Fix Version/s: | 3.6.7 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Josef Ahmad | Assignee: | Ian Boros |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||
| Sprint: | Query 2018-07-16, Query 2018-07-30, Query 2018-08-13 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Observed in v3.6. Unable to reproduce on v3.4. $sample command never returns when run with explain(true). db.currentOps() shows: "planSummary": "MULTI_ITERATOR", "numYields": 18619890, numYields continuously increases. |
| Comments |
| Comment by Githook User [ 30/Jul/18 ] |
|
Author: {'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}Message: |
| Comment by David Storch [ 05/Jul/18 ] |
|
One more note: In the process of investigating this issue, I found a separate problem with explaining $sample that affects 4.0 and master. See |
| Comment by David Storch [ 05/Jul/18 ] |
|
After investigating, I can confirm that this problem was introduced by This PlanExecutor encapsulates an execution plan which simply reads from a random cursor—a RecordCursor obtained by calling RecordStore::getRandomCursor(). Existing random cursor implementations sample via a pseudorandom btree walk. They also never return EOF, but rather will sample infinitely (with replacement) as a caller iterates the cursor. The explain code is doing just this—in its attempt to gather stats from the cursor, it is stuck in an infinite loop here checking for an EOF that will never come. The problem is that the explain code path iterates the underlying random cursor directly, circumventing the code in DocumentSourceSampleFromRandomCursor which is responsible for duplicate elimination, and, importantly, for signaling EOF for $sample queries. |
| Comment by David Storch [ 05/Jul/18 ] |
|
I can reproduce this issue on the 3.6 branch but not on the master branch. My initial guess is that this is a regression due to |
| Comment by Ramon Fernandez Marina [ 05/Jul/18 ] |
|
I can confirm this behavior: in 3.4.4 the explain() command returns, in 3.6.5 doesn't. |