-
Type:
Bug
-
Resolution: Duplicate
-
Priority:
Blocker - P1
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
-
ALL
-
200
-
None
-
None
-
None
-
None
-
None
-
None
-
None
This seems to be a similar issue to that in BF-41240.
I noticed this issue in my patch but it seemed unrelated to my changes and repros on master. We are running into an fassert with the following error when attempting to resume MP trials in CBR fallback No MP Results:
WTIndex::checkKeyIsOrdered: the new key is out of order with respect to the previous key"
Note that this only occurs on DEBUG builds as indicated by the comment in the code here and is non-deterministic which could explain why this specific failure has not shown up as a BF yet.
I did some investigating and it seems that the issue manifests when we do the following in the CBR NoMPResults fallback:
1. Run first phase MP trials
2. Fallback to CBR, i.e. call SamplingCE
3. Not all CBR plans estimable so resume MP
As seen in the BF linked above, a similar issue also manifests when we try to abandon the MP trials and try to save state after running CBR sampling and being able to pick the best plan.
I was able to repro my issue locally on the latest master commit by running the following command. Note that it is non-deterministic meaning it only hits this fassert on some of the test runs:
buildscripts/resmoke.py run --suites=no_passthrough_with_mongod --jobs=1 --repeatTestsMax=1000 --repeatTestsMin=2 --repeatTestsSecs=600.0 --continueOnFailure jstests/noPassthroughWithMongod/query/plan_ranking/cbrForNoMPResults.js
The specific test case is testReturnKeyIsPlannedWithMultiPlanner() which exercises the code path mentioned above.
I believe the PR for explain changes (SERVER-115402) introduced the new flow where we sometimes try to resume MP after sampling whereas previously we were always just returning results of CBR to the caller regardless of if CBR fully succeeded or not and thus not interleaving sampling and MP trials.
- depends on
-
SERVER-117639 Save/restore state around multiplanning intervals to avoid resources' corrruption
-
- Closed
-
- is duplicated by
-
SERVER-117639 Save/restore state around multiplanning intervals to avoid resources' corrruption
-
- Closed
-
- is related to
-
SERVER-115402 Populate rejected plans from explain when using restrictive approach
-
- Closed
-