[SERVER-83091] $or query can trigger an infinite loop during plan enumeration Created: 09/Nov/23 Updated: 20/Dec/23 Resolved: 28/Nov/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 7.0.2, 7.0.3, 7.2.0-rc1 |
| Fix Version/s: | 7.3.0-rc0, 7.2.0-rc2, 7.0.5, 6.0.13, 5.0.24 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | David Storch | Assignee: | David Storch |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | bkp | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||
| Backport Requested: |
v7.2, v7.0, v6.0, v5.0
|
||||||||||||||||||||||||||||||||
| Sprint: | QO 2023-11-13, QO 2023-11-27, QO 2023-12-11 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Linked BF Score: | 14 | ||||||||||||||||||||||||||||||||
| Description |
|
Certain $or queries can trigger an infinite loop in the PlanEnumerator. While the plan space can be quite large for indexed $or queries, we have logic which is supposed to bail out of plan enumeration once a certain number of alternatives have been generated. This logic is not kicking in correctly. Note that this only affects the so-called "lockstep $or" enumeration order which was first implemented in
Note that the infinite loop does not include any interrupt points, so the affected queries are impossible to kill and can block the server from shutting down. It may also manifest as increased CPU activity, since the thread is spinning busily rather than deadlocked. As of this writing, I've confirmed that 7.0.2 is affected, but I plan to double-check whether more recent versions are affected by the bug as well. |
| Comments |
| Comment by Githook User [ 01/Dec/23 ] | |
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: (cherry picked from commit 0b8d487a19477c72a64676db203b84763a363bbe) GitOrigin-RevId: 6dceb9b856961509390e3ff06f035eb65d81d74f | |
| Comment by Githook User [ 01/Dec/23 ] | |
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: (cherry picked from commit 479bb27e76bff47398a7c1d3060e818645cef8ed) | |
| Comment by Githook User [ 28/Nov/23 ] | |
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: (cherry picked from commit f84e9c8325f3575929accc645e857c8d00c93391) | |
| Comment by Githook User [ 28/Nov/23 ] | |
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: (cherry picked from commit a36f67db89cda0f0b9fbe54a97290a21616efd5d) | |
| Comment by Githook User [ 28/Nov/23 ] | |
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: | |
| Comment by David Storch [ 13/Nov/23 ] | |
|
I've solved the mystery of why this initially appeared as though it did not affect the master branch. The reason is that the issue is related to nested $or predicates, but these get simplified by the new boolean simplification module which was recently enabled under
Boolean simplification makes this issue much harder to reproduce, though it still seems theoretically possible to construct an example in which nested $or predicates do not get flattened (e.g. if boolean simplification doesn't kick in) that is subject to the bug. Moreover, the system shouldn't depend on boolean simplification to behave correctly. I'm marking the most recent release (7.2.0-rc0) as an affected version. The fix should be implemented in the master branch and backported from there in the usual way. | |
| Comment by David Storch [ 10/Nov/23 ] | |
|
After digging through my repro in a debugger, I think I have a sense of what's going wrong. The situation is that the PlanEnumerator memo data structure has two LockstepOrAssignment nodes, where one refers to the other as a subnode. The symptom is that during a call PlanEnumerator::nextMemo() to advance the memo state, we get stuck in the this while() loop associated with the uppermost node. I think the fundamental problem is that the lockstep enumeration algorithm assumes that each of its subnodes has a fixed number of enumeration states, and will consistently report when it has "wrapped around". For instance, imagine that a subnode has three enumeration states. The lockstep algorithm will only work correctly if every third recursive call to nextMemo() on this subnode returns true in order to indicate that it has wrapped around. When the subnode is itself a LockstepOrAssignment node, this assumption is violated. In particular, it is violated because the LockstepOrAssignment subnode will start returning true on every nextMemo() call as soon as it has reached internalQueryEnumerationMaxOrSolutions limit (which has a default value of 10). I think that we should probably fix this by relaxing the lockstep enumeration algorithm to work correctly even if a subnode starts wrapping around sooner than it did on the first round of plan enumeration. That way we don't have to fundamentally change the property that each OR node in the memo only allows a maximum of 10 distinct plan enumerations. The next steps for this ticket are to:
| |
| Comment by David Storch [ 10/Nov/23 ] | |
|
My script reproduces the issue on versions between 7.0.2 and 7.0.4-rc0 (the most recent 7.0.x release as of this writing). I could not reproduce it in the same way on 7.0.0, 7.0.1, or 7.2.0-rc1. This may be happenstance, or it may be that the bug really only affects certain versions on the 7.0 branch. I'll update the "affectsVersions" field accordingly for now. EDIT: I realized that the bug does not affect 7.0.0 or 7.0.1 because lockstep enumeration was not enabled as the default behavior until 7.0.2 (see |