SERVER-48614 is an example of a subtle issue around correctness of the plan cache that was not caught by any of our hand-written or auto-generated correctness tests. The plan cache code has some intricate logic around computation of the plan cache key based on the query itself as well as the current set of indexes. Queries which cannot always share the same execution plan (due to correctness concerns around things like null semantics, collation, partial indexes, and wildcard indexes) must be assigned different plan cache keys. Failure to discriminate between queries that can and can't share an execution plan can lead to the system incorrectly using a cached plan, ultimately resulting in incorrect results.
Bugs of this class can be hard to observe or reproduce because they depend on having the plan cache in a particular state. This state is in-memory only, and does not persist across server restarts. It is also per-node state that does not replicate. It cannot be injected directly, but rather must be established indirectly by running queries. Finally, the fact that plan cache entries can be either "active" or "inactive" (see https://docs.mongodb.com/manual/core/query-plans/#plan-cache-entry-state) can complicate testing.
In order to supplement our hand-written unit tests and integration tests for this part of the system, we could augment our various generational fuzz testing suites in order to provide better plan cache-related test coverage. I brainstormed a few ideas around this with Ian Boros:
- We could introduce a flag which causes queries to circumvent the plan cache. Then we could have a fuzzer variant which compares a server with the plan cache enabled against a server with the plan cache disabled.
- We could run queries multiple times in the fuzzer and assert that we get the same result set each time. Multiple executions of a query shape are required in order to produce an active plan cache entry. This type of assertion would prove that queries get the same results when their plan is taken from the cache versus planned from scratch.
- Even without comparing the result sets from multiple runs of the same query, the existing generational fuzzer suites could change to run the same query multiple times. This would make it more likely that the server creates active plan cache entries, and therefore could expose plan cache bugs in suites such as the multiversion agg fuzzer.
- We could add some kind of command to pre-populate the cache and make the fuzzer use this tool to add cache entries before running queries?
- We could run the fuzzer with inactive cache entries disabled. This would increase the likelihood of plans being recovered from the cache during fuzz testing.
It's not clear which of these ideas gives us the most bang for our buck, but it would be nice to have something of this kind to mitigate the risk of future bugs such as