[SERVER-22279] SubplanStage fails to register its MultiPlanStage Created: 22/Jan/16 Updated: 06/Sep/18 Resolved: 26/Jan/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.2.1, 3.3.0 |
| Fix Version/s: | 3.2.3, 3.3.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Marco Biscaro | Assignee: | Mathias Stearn |
| Resolution: | Done | Votes: | 0 |
| Labels: | code-only | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Completed: | |||||||||||||||||
| Steps To Reproduce: | 1. Restore the attached dump on an instance using WiredTiger
3. Create the following indexes:
4. Run the query used on step 2 again and the bug will be triggered |
||||||||||||||||
| Sprint: | Integration F (02/01/16) | ||||||||||||||||
| Participants: | |||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Description |
|
Issue Status as of Jan 27, 2016 ISSUE SUMMARY
is a rooted $or. By contrast,
is not a rooted $or due to the implicit AND with the predicate on the c field. A bug in the plan selection path for such predicates may cause the server to improperly save index cursor state in preparation for yielding locks or to improperly restore index cursor state when re-acquiring locks after a yield. USER IMPACT
For servers running the WiredTiger storage engine, this issue could cause poor performance due to suboptimal plan selection for a rooted $or query. The following log message is indicative of this issue:
Servers running MMAPv1 may experience the following symptoms:
WORKAROUNDS
This configuration option should be set with care, as all rooted $or queries will be diverted from their typical execution path, potentially leading to a loss in performance for such queries. AFFECTED VERSIONS FIX VERSION Original descriptionThis causes messages such as save, restore, and invalidate to fail to propagate to the subplans if a yield occurs while choosing a plan for subqueries. This can result in incorrect query execution. This was introduced by https://github.com/mongodb/mongo/commit/c832bc75#diff-d16091f48e3ffb1cc4c4cc6fd6319c2bR326. Original Title and Description: WTIndex::updatePosition - the new key is less than the previous key When I use a query using compund indexes, the following message is shown several times per second on log:
A dump to reproduce the bug is attached. If I run the query without creating indexes, it works fine both on MMAPv1 and WiredTiger:
If I create the following indexes, it works fine on MMAPv1 but triggers the bug on WiredTiger:
|
| Comments |
| Comment by Githook User [ 27/Jan/16 ] | |||||
|
Author: {u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}Message: Failure to do this caused save() and restore() not to be passed through to (cherry picked from commit 27e452997f8be0ee0f513f9f55e1ba4fad4bc027) | |||||
| Comment by Githook User [ 26/Jan/16 ] | |||||
|
Author: {u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}Message: Failure to do this caused save() and restore() not to be passed through to | |||||
| Comment by Marco Biscaro [ 22/Jan/16 ] | |||||
|
Curious... I hadn't noticed the huge number of yields. With MMAPv1, this number is pretty low, even on "cold" query (first query, right after booting the OS):
| |||||
| Comment by Scott Hernandez (Inactive) [ 22/Jan/16 ] | |||||
|
Marco, I see that when these log messages happen (probably one for each yield/wce) it also slows down the operation, due to restarting the index seek/next:
Also interesting: after the first run, it is fast and doesn't log any messages. And, when the server is restarted it is slow twice, not once like after the index is created, before it become fast again. | |||||
| Comment by Marco Biscaro [ 22/Jan/16 ] | |||||
|
Hi Scott. The only evidence I could find of the problem are the messages in the log, but the query hangs until a lot of messages are printed (I guess it prints this message for each document in the collection). When I run this query in a collection with millions of documents, the server hangs for minutes, while the messages are printed. Eventually, the correct result is returned. | |||||
| Comment by Scott Hernandez (Inactive) [ 22/Jan/16 ] | |||||
|
Marco, we will try to reproduce this as well. Thanks for the reproduction steps and data. Other than the informational message in the logs, is there any problem affecting your client or application? |