[SERVER-45921] Index builder invariants on this check (indexSpecs.size() > 1) while trying to start building index. Created: 01/Feb/20  Updated: 10/Apr/20  Resolved: 10/Apr/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Gregory Wlodarek
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-46560 Make Abort index build logic determin... Closed
Related
related to SERVER-44953 Secondaries should restart index buil... Closed
related to SERVER-45916 On primary, 2-phase index build clean... Closed
related to SERVER-46560 Make Abort index build logic determin... Closed
related to SERVER-46012 Aborting index builders through the I... Closed
is related to SERVER-45933 2 phase index build running with maxT... Closed
Operating System: ALL
Steps To Reproduce:

Base commit: c5cc18dd7484867d82959fc221eeb42efae94255

diff --git a/jstests/noPassthrough/index_killop_after_stepdown.js b/jstests/noPassthrough/index_killop_after_stepdown.js
index 67841202c0..df2a2fcbb0 100644
--- a/jstests/noPassthrough/index_killop_after_stepdown.js
+++ b/jstests/noPassthrough/index_killop_after_stepdown.js
@@ -31,7 +31,7 @@ let res = assert.commandWorked(primary.adminCommand(
 const hangAfterInitFailpointTimesEntered = res.count;
 
 res = assert.commandWorked(primary.adminCommand(
-    {configureFailPoint: 'hangBeforeIndexBuildAbortOnInterrupt', mode: 'alwaysOn'}));
+    {configureFailPoint: 'hangAfterIndexBuildAbortOnInterrupt', mode: 'alwaysOn'}));
 const hangBeforeAbortFailpointTimesEntered = res.count;
 
 const createIdx = IndexBuildTest.startIndexBuild(primary, coll.getFullName(), {a: 1});
@@ -57,7 +57,7 @@ try {
 
     // Wait for the command thread to abort the index build.
     assert.commandWorked(primary.adminCommand({
-        waitForFailPoint: "hangBeforeIndexBuildAbortOnInterrupt",
+        waitForFailPoint: "hangAfterIndexBuildAbortOnInterrupt",
         timesEntered: hangBeforeAbortFailpointTimesEntered + 1,
         maxTimeMS: kDefaultWaitForFailPointTimeout
     }));
diff --git a/src/mongo/db/commands/create_indexes.cpp b/src/mongo/db/commands/create_indexes.cpp
index 6c69b9ddc3..76cadea82d 100644
--- a/src/mongo/db/commands/create_indexes.cpp
+++ b/src/mongo/db/commands/create_indexes.cpp
@@ -78,6 +78,7 @@ MONGO_FAIL_POINT_DEFINE(createIndexesWriteConflict);
 // collection is created.
 MONGO_FAIL_POINT_DEFINE(hangBeforeCreateIndexesCollectionCreate);
 MONGO_FAIL_POINT_DEFINE(hangBeforeIndexBuildAbortOnInterrupt);
+MONGO_FAIL_POINT_DEFINE(hangAfterIndexBuildAbortOnInterrupt);
 
 constexpr auto kIndexesFieldName = "indexes"_sd;
 constexpr auto kCommandName = "createIndexes"_sd;
@@ -1021,6 +1022,7 @@ public:
                 }
                 return runCreateIndexesWithCoordinator(opCtx, dbname, cmdObj, errmsg, result);
             } catch (const DBException& ex) {
+                hangAfterIndexBuildAbortOnInterrupt.pauseWhileSet();
                 // We can only wait for an existing index build to finish if we are able to release
                 // our locks, in order to allow the existing index build to proceed. We cannot
                 // release locks in transactions, so we bypass the below logic in transactions.

Sprint: Execution Team 2020-02-10, Execution Team 2020-03-23, Execution Team 2020-04-06, Execution Team 2020-04-20
Participants:
Linked BF Score: 40

 Description   

When the createIndex thread marks the index build as aborted, it sets the abortTimestamp as null timestamp. So, when the indexBuildCoordinatorThread sees this aborted flag and assume a step down also happened, the stepped down primary will go into this code block. This means the index build got torn down (unregistered the index build). But, we don't remove the catalog entry i.e, the index catalog entry for the aborted index build with ready:false will present in the catalog table. Now, assume, secondary had already started the index build before the abortion event on the primary. This means, if that secondary gets elected as new primary, it can go ahead and commit the index Build. On receiving the commitIndexBuild oplog entry, the old primary (after SERVER-44953) will restart and try the initialize the index build . But, then since the catalog has the index entry with ready:false (representing in-progress/unfinished index build), this invariant check fails leading to crash.



 Comments   
Comment by Louis Williams [ 10/Apr/20 ]

The problem described in this ticket was fixed by SERVER-46560

Comment by Suganthi Mani [ 01/Feb/20 ]

I feel, if we can fix the way how our index build abortion works, then, there won't be any need of commitIndex oplog entry on secondary to restart the index build. Also, can fix SERVER-45916.

To be noted, restarting index build on commitIndex oplog entry exists, then 2 pc consensus protocol (index build majority commit quorum) can't work.

Generated at Thu Feb 08 05:10:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.