[SERVER-51526] Hybrid index builds can miss writes and crash in the event of a well-timed WriteConflictException Created: 13/Oct/20  Updated: 29/Oct/23  Resolved: 22/Oct/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.0, 4.4.0
Fix Version/s: 4.9.0, 4.4.3, 4.2.12

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-40820 Jstestfuzz suite to introduce spuriou... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2
Steps To Reproduce:

resmoke run --suite=concurrency_replication --mongodSetParameters='\{"failpoint.WTWriteConflictException": {mode: {activationProbability: 0.05}}}' jstests/concurrency/fsm_workloads/access_collection_in_transaction_after_catalog_changes.js

This will eventually fail in this way:

[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.491-04:00 F - [IndexBuildsCoordinatorMongod-0] Invariant failure \{"expr":"_indexBuildInterceptor->areAllWritesApplied(opCtx)","file":"src/mongo/db/catalog/index_build_block.cpp","line":208}
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D1E386 mongo::stack_trace_detail::(anonymous namespace)::LibunwindStepIteration::LibunwindStepIteration(mongo::StackTraceSink&)+0x76
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D1DD7C mongo::stack_trace_detail::(anonymous namespace)::printStackTraceImpl(mongo::stack_trace_detail::(anonymous namespace)::Options const&, mongo::StackTraceSink*)+0x8C
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D1E0C6 mongo::printStackTrace()+0x36
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D19230 mongo::(anonymous namespace)::printSignalAndBacktrace(int)+0x70
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D19159 mongo::(anonymous namespace)::abruptQuit(int)+0x29
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D188BB mongo::(anonymous namespace)::abruptQuitAction(int, siginfo_t*, void*)+0x1B
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.853-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD51F421C70 funlockfile+0x50
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.853-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD51F280E35 gsignal+0x145
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.853-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD51F26B895 abort+0x127
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.853-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520CEB2D9 mongo::invariantFailed(char const*, char const*, unsigned int)+0x219
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520BC196C void mongo::invariantWithLocation<bool>(bool const&, char const*, char const*, unsigned int)+0x3C
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511858851 mongo::IndexBuildBlock::success(mongo::OperationContext*, mongo::Collection*)+0x341
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511B01FCD mongo::MultiIndexBlock::commit(mongo::OperationContext*, mongo::Collection*, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)+0x54D
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511C646AF mongo::IndexBuildsManager::commitIndexBuild(mongo::OperationContext*, mongo::CollectionWriter&, mongo::NamespaceString const&, mongo::UUID const&, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)::$_3::operator()() const+0xCF
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511C614F2 auto mongo::writeConflictRetry<mongo::IndexBuildsManager::commitIndexBuild(mongo::OperationContext*, mongo::CollectionWriter&, mongo::NamespaceString const&, mongo::UUID const&, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)::$_3>(mongo::OperationContext*, mongo::StringData, mongo::StringData, mongo::IndexBuildsManager::commitIndexBuild(mongo::OperationContext*, mongo::CollectionWriter&, mongo::NamespaceString const&, mongo::UUID const&, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)::$_3&&)+0x162
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511C6130E mongo::IndexBuildsManager::commitIndexBuild(mongo::OperationContext*, mongo::CollectionWriter&, mongo::NamespaceString const&, mongo::UUID const&, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)+0x1AE
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E70D9A mongo::IndexBuildsCoordinator::_insertKeysFromSideTablesAndCommit(mongo::OperationContext*, std::shared_ptr<mongo::ReplIndexBuildState>, mongo::IndexBuildAction, mongo::IndexBuildsCoordinator::IndexBuildOptions const&, mongo::Timestamp const&)+0x138A
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD515689C49 mongo::IndexBuildsCoordinatorMongod::_waitForNextIndexBuildActionAndCommit(mongo::OperationContext*, std::shared_ptr<mongo::ReplIndexBuildState>, mongo::IndexBuildsCoordinator::IndexBuildOptions const&)+0x929
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E6D073 mongo::IndexBuildsCoordinator::_buildIndex(mongo::OperationContext*, std::shared_ptr<mongo::ReplIndexBuildState>, mongo::IndexBuildsCoordinator::IndexBuildOptions const&)+0x353
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E6BFA2 mongo::IndexBuildsCoordinator::_runIndexBuildInner(mongo::OperationContext*, std::shared_ptr<mongo::ReplIndexBuildState>, mongo::IndexBuildsCoordinator::IndexBuildOptions const&, boost::optional<mongo::ResumeIndexInfo> const&)+0x1D2
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E6B515 mongo::IndexBuildsCoordinator::_runIndexBuild(mongo::OperationContext*, mongo::UUID const&, mongo::IndexBuildsCoordinator::IndexBuildOptions const&, boost::optional<mongo::ResumeIndexInfo> const&)::$_110::operator()() const+0x75
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.856-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E6B1F2 mongo::IndexBuildsCoordinator::_runIndexBuild(mongo::OperationContext*, mongo::UUID const&, mongo::IndexBuildsCoordinator::IndexBuildOptions const&, boost::optional<mongo::ResumeIndexInfo> const&)+0x4C2
[ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.856-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD51568EC71 auto mongo::IndexBuildsCoordinatorMongod::_startIndexBuild(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, mongo::UUID, std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> > const&, mongo::UUID const&, mongo::IndexBuildProtocol, mongo::IndexBuildsCoordinator::IndexBuildOptions, boost::optional<mongo::ResumeIndexInfo> const&)::$_2::operator()<mongo::Status>(mongo::Status)+0x5F1

Sprint: Execution Team 2020-11-02
Participants:

 Description   

The hybrid index build drain phase has a flaw that allows it to finish early in the event of a WCE and then complete without applying all writes. This does not lead to an index inconsistency but causes this invariant to fail.

Note: this crash cannot manifest from to a routine "write-write" conflict because the index builder has exclusive access to the index at this point. This could be caused by the storage engine rolling-back operations to relieve cache pressure.

The bug specifically is that the atEof variable is maintained outside of the writeConflictRetry of applySingleBatch but is not reset in the event of a WriteConflictException.

Imagine the following scenario:

  • Assume there are 10 records in the side writes table: Records 1 through 10
  • While draining a batch (of 10 in testing), we read Records 1 through 10 and insert their keys into the index. Once we reach the end of the table, we set atEof to true. While deleting the records from the side table, we get a WCE.
  • This starts the batch over again, but because atEof is already set to true, we fail to reapply the side writes that were abandoned in the previous batch.


 Comments   
Comment by Githook User [ 05/Jan/21 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-51526 Hybrid index builds can miss writes and crash in the event of a well-timed WriteConflictException

(cherry picked from commit c53dc2c22b5dbecb561bda893d374085de5c54d1)
(cherry picked from commit 832f6f4e90ec9e6bd072b3fc94cd3cf11b8021fd)
Branch: v4.2
https://github.com/mongodb/mongo/commit/09f147cd6c11149abe4401e09969ca3bcb1cf2f3

Comment by Githook User [ 20/Nov/20 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-51526 Hybrid index builds can miss writes and crash in the event of a well-timed WriteConflictException

(cherry picked from commit c53dc2c22b5dbecb561bda893d374085de5c54d1)
Branch: v4.4
https://github.com/mongodb/mongo/commit/832f6f4e90ec9e6bd072b3fc94cd3cf11b8021fd

Comment by Githook User [ 22/Oct/20 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-51526 Hybrid index builds can miss writes and crash in the event of a well-timed WriteConflictException
Branch: master
https://github.com/mongodb/mongo/commit/c53dc2c22b5dbecb561bda893d374085de5c54d1

Generated at Thu Feb 08 05:25:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.