[SERVER-56279] Index build failure errors with NoMatchingDocument as cause of commit quorum failing Created: 22/Apr/21  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 0
Labels: execution_intern, neweng, techdebt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-57511 Only the index build operation should... Closed
Related
related to SERVER-48271 createIndexes command in fsm_workload... Closed
Assigned Teams:
Storage Execution
Sprint: Execution Team 2021-06-14, Execution Team 2021-07-12, Execution Team 2021-07-26
Participants:

 Description   

         Foreground jstests/concurrency/fsm_workloads/find_cmd_with_indexes_timeseries.js
         Error: assert failed : Create index failed: {
         	"ok" : 0,
         	"errmsg" : "Index build failed: f88cdbf6-170f-48e1-9563-905b96d785ae: Collection test7_fsmdb0.system.buckets.find_cmd_with_indexes_timeseries_fsmcoll0 ( 0b690fe9-098e-4f8c-857f-7e96030112af ) :: caused by :: failed to get commit quorum before committing index build: f88cdbf6-170f-48e1-9563-905b96d785ae :: caused by :: No matching IndexBuildEntry found with indexBuildUUID: f88cdbf6-170f-48e1-9563-905b96d785ae",
         	"code" : 47,
         	"codeName" : "NoMatchingDocument",
         	"$clusterTime" : {
         		"clusterTime" : Timestamp(1619076524, 105),
         		"signature" : {
         			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
         			"keyId" : NumberLong(0)
         		}
         	},
         	"operationTime" : Timestamp(1619076524, 105)
         }
 
         quietlyDoAssert@jstests/concurrency/fsm_libs/assert.js:55:18
         assert@src/mongo/shell/assert.js:151:9
         wrapAssertFn@jstests/concurrency/fsm_libs/assert.js:65:13
         assertWithLevel@jstests/concurrency/fsm_libs/assert.js:89:9
         processCreateIndex@jstests/concurrency/fsm_workloads/find_cmd_with_indexes_timeseries.js:70:9
         createTimeIndex@jstests/concurrency/fsm_workloads/find_cmd_with_indexes_timeseries.js:98:13
         runFSM@jstests/concurrency/fsm_libs/fsm.js:132:17
         @eval:8:9
         main@jstests/concurrency/fsm_libs/worker_thread.js:217:17
         @eval:5:12
         @eval:3:24
         _threadStartWrapper@:26:16
 
  :
 throwError@jstests/concurrency/fsm_libs/runner.js:354:23
 runWorkloads@jstests/concurrency/fsm_libs/resmoke_runner.js:202:5
 @jstests/concurrency/fsm_libs/resmoke_runner.js:283:1
 @jstests/concurrency/fsm_libs/resmoke_runner.js:1:2
 failed to load: jstests/concurrency/fsm_libs/resmoke_runner.js

I haven't investigated the code path on which this occurs, but I don't this kind of error should happen.

This ticket is to improve the error message



 Comments   
Comment by Dianna Hohensee (Inactive) [ 07/Jun/21 ]

This should wait on the outcome of SERVER-57511

Comment by Dianna Hohensee (Inactive) [ 03/Jun/21 ]

The dropIndexes codepath can abort an index build (calls _completeAbort) because I see this log. _completeAbort, though, removes the indexBuilds document for the index build, which I find weird that a thread other than the build thread would delete the index build state.

This makes for a weird index build error NoMatchingDocument saying that the commit quorum failed because a document couldn't be found.

Comment by Dianna Hohensee (Inactive) [ 03/Jun/21 ]

More interesting logs from the failure:

[conn20133] "Index build: joined after abort","attr":{"buildUUID":{"uuid":{"$uuid":"f88cdbf6-170f-48e1-9563-905b96d785ae"}},"waitResult":{"code":0,"codeName":"OK"},"status":{"code":47,"codeName":"NoMatchingDocument","errmsg":"failed to get commit quorum before committing index build: f88cdbf6-170f-48e1-9563-905b96d785ae :: caused by :: No matching IndexBuildEntry found with indexBuildUUID: f88cdbf6-170f-48e1-9563-905b96d785ae"}}
[conn20147] "Index build: failed","attr":{"buildUUID":{"uuid":{"$uuid":"f88cdbf6-170f-48e1-9563-905b96d785ae"}},"error":{"code":47,"codeName":"NoMatchingDocument","errmsg":"failed to get commit quorum before committing index build: f88cdbf6-170f-48e1-9563-905b96d785ae :: caused by :: No matching IndexBuildEntry found with indexBuildUUID: f88cdbf6-170f-48e1-9563-905b96d785ae"}}

It looks like the index build is aborted, and, instead of the builder thread aborting cleanly, the thread errors oddly: "Index build: joined after abort".

Generated at Thu Feb 08 05:38:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.