[SERVER-56792] tassert() failure when index is dropped during SBE cached plan replanning Created: 10/May/21  Updated: 29/Oct/23  Resolved: 17/May/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Rishab Joshi (Inactive) Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Query Execution 2021-05-17, Query Execution 2021-05-31
Participants:
Linked BF Score: 41

 Comments   
Comment by Githook User [ 17/May/21 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-56792 Fail query in SBE cached planner and subplanner if an index is dropped during yield
Branch: master
https://github.com/mongodb/mongo/commit/66a536d56c7146547d30a965f0ecb35d611a9a42

Comment by David Storch [ 13/May/21 ]

kyle.suarez not to my knowledge, though I was curious if we could rely on something else, like the internal catalog ident? Setting that aside, my planned patch is to factor out the logic from RequiresAllIndicesStage and use it for both the classic and SBE runtime planners.

Comment by Kyle Suarez [ 13/May/21 ]

Do indexes have UUIDs like collections these days? Or does the collection itself have some sort of epoch that gets incremented when DDL operations are performed? That would be one way to detect a change like this and bail, but I'm not up to date on the latest in catalog land.

Comment by David Storch [ 13/May/21 ]

Another note: I suspect a similar issue exists for the SBE sub-planner.

Comment by David Storch [ 13/May/21 ]

After a bit of tinkering, I've found that I can reliably reproduce this bug in under a minute by applying the following patch to the test:

diff --git a/jstests/concurrency/fsm_workloads/drop_index_during_replan.js b/jstests/concurrency/fsm_workloads/drop_index_during_replan.js
index f4f68e0ffb..431b553094 100644
--- a/jstests/concurrency/fsm_workloads/drop_index_during_replan.js
+++ b/jstests/concurrency/fsm_workloads/drop_index_during_replan.js
@@ -45,6 +45,8 @@ var $config = (function() {
             // another thread has already dropped this index.
             db[collName].dropIndex({b: 1});
 
+            sleep(50);
+
             // Recreate the index that was dropped.
             assertAlways.commandWorkedOrFailedWithCode(db[collName].createIndex({b: 1}), [
                 ErrorCodes.IndexBuildAborted,
@@ -66,11 +68,15 @@ var $config = (function() {
             assertAlways.commandWorked(
                 db[collName].insert({a: "unique_value_" + i, b: "common_value_b"}));
         }
+
+        const yieldIterations = 1;
+        assertAlways.commandWorked(
+            db.adminCommand({setParameter: 1, internalQueryExecYieldIterations: yieldIterations}));
     }
 
     return {
         threadCount: 10,
-        iterations: 50,
+        iterations: 500,
         data: data,
         states: states,

In order for the bug to repro, the {b: 1} index has to be dropped during a yield, so this patch makes yields happen much more often. Furthermore, the index must not be rebuilt promptly; if the index is quickly rebuilt, then the query will use this newly constructed index and nothing goes wrong. This is why I added a sleep in between dropping and rebuilding the index. Finally, I made the test run for longer in order to ensure it repros reliably.

Also, after thinking about this some more, I don't believe that a correct solution is to relax the tassert() to a uassert(). Otherwise, we could lookup the index by name but it could be an entirely different index from the one we used for planning that just happens to have the same name as before!

Generated at Thu Feb 08 05:40:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.