Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-51526

Hybrid index builds can miss writes and crash in the event of a well-timed WriteConflictException

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 4.2.0, 4.4.0
    • Fix Version/s: 4.9.0, 4.4.3, 4.2.12
    • Component/s: None
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.4, v4.2
    • Steps To Reproduce:
      Hide

      resmoke run --suite=concurrency_replication --mongodSetParameters='\{"failpoint.WTWriteConflictException": {mode: {activationProbability: 0.05}}}' jstests/concurrency/fsm_workloads/access_collection_in_transaction_after_catalog_changes.js
      

      This will eventually fail in this way:

      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.491-04:00 F - [IndexBuildsCoordinatorMongod-0] Invariant failure \{"expr":"_indexBuildInterceptor->areAllWritesApplied(opCtx)","file":"src/mongo/db/catalog/index_build_block.cpp","line":208}
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D1E386 mongo::stack_trace_detail::(anonymous namespace)::LibunwindStepIteration::LibunwindStepIteration(mongo::StackTraceSink&)+0x76
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D1DD7C mongo::stack_trace_detail::(anonymous namespace)::printStackTraceImpl(mongo::stack_trace_detail::(anonymous namespace)::Options const&, mongo::StackTraceSink*)+0x8C
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D1E0C6 mongo::printStackTrace()+0x36
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D19230 mongo::(anonymous namespace)::printSignalAndBacktrace(int)+0x70
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D19159 mongo::(anonymous namespace)::abruptQuit(int)+0x29
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D188BB mongo::(anonymous namespace)::abruptQuitAction(int, siginfo_t*, void*)+0x1B
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.853-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD51F421C70 funlockfile+0x50
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.853-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD51F280E35 gsignal+0x145
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.853-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD51F26B895 abort+0x127
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.853-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520CEB2D9 mongo::invariantFailed(char const*, char const*, unsigned int)+0x219
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520BC196C void mongo::invariantWithLocation<bool>(bool const&, char const*, char const*, unsigned int)+0x3C
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511858851 mongo::IndexBuildBlock::success(mongo::OperationContext*, mongo::Collection*)+0x341
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511B01FCD mongo::MultiIndexBlock::commit(mongo::OperationContext*, mongo::Collection*, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)+0x54D
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511C646AF mongo::IndexBuildsManager::commitIndexBuild(mongo::OperationContext*, mongo::CollectionWriter&, mongo::NamespaceString const&, mongo::UUID const&, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)::$_3::operator()() const+0xCF
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511C614F2 auto mongo::writeConflictRetry<mongo::IndexBuildsManager::commitIndexBuild(mongo::OperationContext*, mongo::CollectionWriter&, mongo::NamespaceString const&, mongo::UUID const&, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)::$_3>(mongo::OperationContext*, mongo::StringData, mongo::StringData, mongo::IndexBuildsManager::commitIndexBuild(mongo::OperationContext*, mongo::CollectionWriter&, mongo::NamespaceString const&, mongo::UUID const&, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)::$_3&&)+0x162
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511C6130E mongo::IndexBuildsManager::commitIndexBuild(mongo::OperationContext*, mongo::CollectionWriter&, mongo::NamespaceString const&, mongo::UUID const&, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)+0x1AE
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E70D9A mongo::IndexBuildsCoordinator::_insertKeysFromSideTablesAndCommit(mongo::OperationContext*, std::shared_ptr<mongo::ReplIndexBuildState>, mongo::IndexBuildAction, mongo::IndexBuildsCoordinator::IndexBuildOptions const&, mongo::Timestamp const&)+0x138A
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD515689C49 mongo::IndexBuildsCoordinatorMongod::_waitForNextIndexBuildActionAndCommit(mongo::OperationContext*, std::shared_ptr<mongo::ReplIndexBuildState>, mongo::IndexBuildsCoordinator::IndexBuildOptions const&)+0x929
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E6D073 mongo::IndexBuildsCoordinator::_buildIndex(mongo::OperationContext*, std::shared_ptr<mongo::ReplIndexBuildState>, mongo::IndexBuildsCoordinator::IndexBuildOptions const&)+0x353
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E6BFA2 mongo::IndexBuildsCoordinator::_runIndexBuildInner(mongo::OperationContext*, std::shared_ptr<mongo::ReplIndexBuildState>, mongo::IndexBuildsCoordinator::IndexBuildOptions const&, boost::optional<mongo::ResumeIndexInfo> const&)+0x1D2
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E6B515 mongo::IndexBuildsCoordinator::_runIndexBuild(mongo::OperationContext*, mongo::UUID const&, mongo::IndexBuildsCoordinator::IndexBuildOptions const&, boost::optional<mongo::ResumeIndexInfo> const&)::$_110::operator()() const+0x75
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.856-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E6B1F2 mongo::IndexBuildsCoordinator::_runIndexBuild(mongo::OperationContext*, mongo::UUID const&, mongo::IndexBuildsCoordinator::IndexBuildOptions const&, boost::optional<mongo::ResumeIndexInfo> const&)+0x4C2
      [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.856-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD51568EC71 auto mongo::IndexBuildsCoordinatorMongod::_startIndexBuild(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, mongo::UUID, std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> > const&, mongo::UUID const&, mongo::IndexBuildProtocol, mongo::IndexBuildsCoordinator::IndexBuildOptions, boost::optional<mongo::ResumeIndexInfo> const&)::$_2::operator()<mongo::Status>(mongo::Status)+0x5F1
      

      Show
      resmoke run --suite=concurrency_replication --mongodSetParameters='\{"failpoint.WTWriteConflictException": {mode: {activationProbability: 0.05}}}' jstests/concurrency/fsm_workloads/access_collection_in_transaction_after_catalog_changes.js This will eventually fail in this way: [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.491-04:00 F - [IndexBuildsCoordinatorMongod-0] Invariant failure \{"expr":"_indexBuildInterceptor->areAllWritesApplied(opCtx)","file":"src/mongo/db/catalog/index_build_block.cpp","line":208} [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D1E386 mongo::stack_trace_detail::(anonymous namespace)::LibunwindStepIteration::LibunwindStepIteration(mongo::StackTraceSink&)+0x76 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D1DD7C mongo::stack_trace_detail::(anonymous namespace)::printStackTraceImpl(mongo::stack_trace_detail::(anonymous namespace)::Options const&, mongo::StackTraceSink*)+0x8C [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D1E0C6 mongo::printStackTrace()+0x36 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D19230 mongo::(anonymous namespace)::printSignalAndBacktrace(int)+0x70 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D19159 mongo::(anonymous namespace)::abruptQuit(int)+0x29 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.852-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520D188BB mongo::(anonymous namespace)::abruptQuitAction(int, siginfo_t*, void*)+0x1B [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.853-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD51F421C70 funlockfile+0x50 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.853-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD51F280E35 gsignal+0x145 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.853-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD51F26B895 abort+0x127 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.853-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520CEB2D9 mongo::invariantFailed(char const*, char const*, unsigned int)+0x219 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD520BC196C void mongo::invariantWithLocation<bool>(bool const&, char const*, char const*, unsigned int)+0x3C [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511858851 mongo::IndexBuildBlock::success(mongo::OperationContext*, mongo::Collection*)+0x341 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511B01FCD mongo::MultiIndexBlock::commit(mongo::OperationContext*, mongo::Collection*, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)+0x54D [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511C646AF mongo::IndexBuildsManager::commitIndexBuild(mongo::OperationContext*, mongo::CollectionWriter&, mongo::NamespaceString const&, mongo::UUID const&, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)::$_3::operator()() const+0xCF [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.854-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511C614F2 auto mongo::writeConflictRetry<mongo::IndexBuildsManager::commitIndexBuild(mongo::OperationContext*, mongo::CollectionWriter&, mongo::NamespaceString const&, mongo::UUID const&, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)::$_3>(mongo::OperationContext*, mongo::StringData, mongo::StringData, mongo::IndexBuildsManager::commitIndexBuild(mongo::OperationContext*, mongo::CollectionWriter&, mongo::NamespaceString const&, mongo::UUID const&, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)::$_3&&)+0x162 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511C6130E mongo::IndexBuildsManager::commitIndexBuild(mongo::OperationContext*, mongo::CollectionWriter&, mongo::NamespaceString const&, mongo::UUID const&, std::function<void (mongo::BSONObj const&)>, std::function<void ()>)+0x1AE [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E70D9A mongo::IndexBuildsCoordinator::_insertKeysFromSideTablesAndCommit(mongo::OperationContext*, std::shared_ptr<mongo::ReplIndexBuildState>, mongo::IndexBuildAction, mongo::IndexBuildsCoordinator::IndexBuildOptions const&, mongo::Timestamp const&)+0x138A [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD515689C49 mongo::IndexBuildsCoordinatorMongod::_waitForNextIndexBuildActionAndCommit(mongo::OperationContext*, std::shared_ptr<mongo::ReplIndexBuildState>, mongo::IndexBuildsCoordinator::IndexBuildOptions const&)+0x929 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E6D073 mongo::IndexBuildsCoordinator::_buildIndex(mongo::OperationContext*, std::shared_ptr<mongo::ReplIndexBuildState>, mongo::IndexBuildsCoordinator::IndexBuildOptions const&)+0x353 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E6BFA2 mongo::IndexBuildsCoordinator::_runIndexBuildInner(mongo::OperationContext*, std::shared_ptr<mongo::ReplIndexBuildState>, mongo::IndexBuildsCoordinator::IndexBuildOptions const&, boost::optional<mongo::ResumeIndexInfo> const&)+0x1D2 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.855-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E6B515 mongo::IndexBuildsCoordinator::_runIndexBuild(mongo::OperationContext*, mongo::UUID const&, mongo::IndexBuildsCoordinator::IndexBuildOptions const&, boost::optional<mongo::ResumeIndexInfo> const&)::$_110::operator()() const+0x75 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.856-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD511E6B1F2 mongo::IndexBuildsCoordinator::_runIndexBuild(mongo::OperationContext*, mongo::UUID const&, mongo::IndexBuildsCoordinator::IndexBuildOptions const&, boost::optional<mongo::ResumeIndexInfo> const&)+0x4C2 [ReplicaSetFixture:job0:secondary0] 2020-10-13T14:13:09.856-04:00 I CONTROL [IndexBuildsCoordinatorMongod-0] Frame: 0x7FD51568EC71 auto mongo::IndexBuildsCoordinatorMongod::_startIndexBuild(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, mongo::UUID, std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> > const&, mongo::UUID const&, mongo::IndexBuildProtocol, mongo::IndexBuildsCoordinator::IndexBuildOptions, boost::optional<mongo::ResumeIndexInfo> const&)::$_2::operator()<mongo::Status>(mongo::Status)+0x5F1
    • Sprint:
      Execution Team 2020-11-02

      Description

      The hybrid index build drain phase has a flaw that allows it to finish early in the event of a WCE and then complete without applying all writes. This does not lead to an index inconsistency but causes this invariant to fail.

      Note: this crash cannot manifest from to a routine "write-write" conflict because the index builder has exclusive access to the index at this point. This could be caused by the storage engine rolling-back operations to relieve cache pressure.

      The bug specifically is that the atEof variable is maintained outside of the writeConflictRetry of applySingleBatch but is not reset in the event of a WriteConflictException.

      Imagine the following scenario:

      • Assume there are 10 records in the side writes table: Records 1 through 10
      • While draining a batch (of 10 in testing), we read Records 1 through 10 and insert their keys into the index. Once we reach the end of the table, we set atEof to true. While deleting the records from the side table, we get a WCE.
      • This starts the batch over again, but because atEof is already set to true, we fail to reapply the side writes that were abandoned in the previous batch.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              louis.williams Louis Williams
              Reporter:
              louis.williams Louis Williams
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: