Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Querying, Testing Infrastructure
Labels:
- greenerbuild
- qexec-team

Assigned Teams:

Query Execution
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

~~SERVER-48614~~ is an example of a subtle issue around correctness of the plan cache that was not caught by any of our hand-written or auto-generated correctness tests. The plan cache code has some intricate logic around computation of the plan cache key based on the query itself as well as the current set of indexes. Queries which cannot always share the same execution plan (due to correctness concerns around things like null semantics, collation, partial indexes, and wildcard indexes) must be assigned different plan cache keys. Failure to discriminate between queries that can and can't share an execution plan can lead to the system incorrectly using a cached plan, ultimately resulting in incorrect results.

Bugs of this class can be hard to observe or reproduce because they depend on having the plan cache in a particular state. This state is in-memory only, and does not persist across server restarts. It is also per-node state that does not replicate. It cannot be injected directly, but rather must be established indirectly by running queries. Finally, the fact that plan cache entries can be either "active" or "inactive" (see https://docs.mongodb.com/manual/core/query-plans/#plan-cache-entry-state) can complicate testing.

In order to supplement our hand-written unit tests and integration tests for this part of the system, we could augment our various generational fuzz testing suites in order to provide better plan cache-related test coverage. I brainstormed a few ideas around this with ian.boros:

We could introduce a flag which causes queries to circumvent the plan cache. Then we could have a fuzzer variant which compares a server with the plan cache enabled against a server with the plan cache disabled.
We could run queries multiple times in the fuzzer and assert that we get the same result set each time. Multiple executions of a query shape are required in order to produce an active plan cache entry. This type of assertion would prove that queries get the same results when their plan is taken from the cache versus planned from scratch.
Even without comparing the result sets from multiple runs of the same query, the existing generational fuzzer suites could change to run the same query multiple times. This would make it more likely that the server creates active plan cache entries, and therefore could expose plan cache bugs in suites such as the multiversion agg fuzzer.
We could add some kind of command to pre-populate the cache and make the fuzzer use this tool to add cache entries before running queries?
We could run the fuzzer with inactive cache entries disabled. This would increase the likelihood of plans being recovered from the cache during fuzz testing.

It's not clear which of these ideas gives us the most bang for our buck, but it would be nice to have something of this kind to mitigate the risk of future bugs such as ~~SERVER-48614~~.

duplicates

SERVER-67118 Add generational query fuzzer testing intended to verify that parameterized plan cache entries work correctly

Closed

is related to

SERVER-48614 Plan cache key computation for wildcard indexes with partialIndexFilter is incorrect, leading to incorrect query results

Closed

Assignee:: [DO NOT USE] Backlog - Query Execution
Reporter:: David Storch
Participants:: [DO NOT USE] Backlog - Query Execution, David Storch, Nicholas Zolnierz, Ralf Strobel
Votes:: 1 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jun 10 2020 07:51:25 PM UTC
Updated:: Jul 25 2024 02:17:15 PM UTC
Resolved:: Jul 25 2024 02:16:11 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates