-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Index Maintenance
-
None
-
Fully Compatible
-
ALL
-
-
Execution Team 2020-01-27, Execution Team 2020-02-10
-
26
Looking at the code, it looks like the issue stems from IndexBuildsCoordinator::onStepUp() as it commits index builds regardless of whether they should be aborted.
I was able to reproduce this, but here is a timeline of events leading up to this problem:
- Node 1 (Primary): Start a geo index build that should fail at the end due to an invalid document for this type of index in the collection.
- Node 2 (Secondary): Start the same index build after seeing the start index build oplog entry, and finish the index build up until the final phase before the primary node. The secondary node now waits for the commit/abort oplog entry from the primary.
- Node 1 (Primary): Step-down after finishing the index build but before aborting the index build and sending the abort oplog entry to the secondary.
- Node 2 becomes primary, Node 1 becomes secondary.
- Node 2 (Primary): Step-up and commit the index build. This was rather unexpected, I would expect an abort index build oplog entry here. Perhaps the onStepUp() function forgets to check the state of the index build.
- Node 2 (Primary): Running validate on this node will fail because we committed an invalid index and validate sees that as a case of corruption:
{ "valid" : false, "warnings" : [ ], "errors" : [ "exception during collection validation: Location16775: cannot extract [lng, lat] array or object from { _id: ObjectId('5e0f730ab7d4d029001cd7b7'), pos: \"invalid\" }" ], "extraIndexEntries" : [ ], "missingIndexEntries" : [ ], "advice" : "A corrupt namespace has been detected. See http://dochub.mongodb.org/core/data-recovery for recovery steps.", "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1578070806, 2), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } }, "operationTime" : Timestamp(1578070806, 2) }
* Node 1 (Secondary): fasserts() when trying to commit the index build after receiving the commit index build oplog entry from the new primary.
[IndexBuildsCoordinatorMongod-0] Fatal assertion 51101 Location16775: Index build: 21ddcd36-e6c3-4b41-a687-12304a79ac9a; Database: test :: caused by :: index build failed on this node but we received a commitIndexBuild oplog entry from the primary with timestamp: Timestamp(1578070806, 2) :: caused by :: cannot extract [lng, lat] array or object from { _id: ObjectId('5e0f730ab7d4d029001cd7b7'), pos: "invalid" } at src/mongo/db/index_builds_coordinator.cpp 1355
- is duplicated by
-
SERVER-39428 Record all indexing errors during simultaneous index builds for later constraint checking
- Closed
- is related to
-
SERVER-45852 Two-phase index build constraints should be checked at the completion of the index build
- Closed
- related to
-
SERVER-46246 index_failover_resolved_key_errors.js hangs when document level locking is not supported
- Closed
-
SERVER-47605 Single-phase index builds should only check constraint violations upon completion
- Closed
-
SERVER-44654 allow unique index builds to continue running on stepdown
- Closed
- links to