Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-86395

Investigate DuplicateKey error while recovering convertToCapped from stable checkpoint

    • Catalog and Routing
    • Fully Compatible
    • ALL
    • v7.3, v7.0, v6.0, v5.0
    • CAR Team 2024-02-19

      When stressing the convertToCapped codepath in suites with stepdowns, this fatal assertion is constantly hit due to DuplicateKey errors.

      Purpose of this ticket is to assess whether it's a test-only issue (because in production that assertion is converted to a simple warning) or - otherwise - to identify the problem and fix it.

      Reproducer
      Simply force executing the convert_to_capped_collection FSM in a suite with stepdowns:

      ./buildscripts/resmoke.py run --suite concurrency_sharded_with_stepdowns --storageEngine=wiredTiger --jobs=1 --storageEngineCacheSizeGB=0.5 --dbpath=/tmp/testpath --runAllFeatureFlagTests --force-excluded-tests jstests/concurrency/fsm_workloads/convert_to_capped_collection.js
      

      Logs from a failed execution

      [j0:s1:n2] 2024-02-08T11:09:48.352+00:00 I  REPL     [BackgroundSync] Processing DDL command oplog entry in OplogBatcher{"oplogEntry":{"oplogEntry":{"op":"c","ns":"test0_fsmdb0.$cmd","ui":{      "$uuid":"491e153d-4706-480c-97cc-27d5ad54e1bd"},"o":{"renameCollection":"test0_fsmdb0.tmpdkL8n.convertToCapped.convert_to_capped_collection_5","to":"test0_fsmdb0.convert_to_capped_collectio      n_5","stayTemp":false,"dropTarget":{"$uuid":"5802bbae-1d09-40df-bf71-26cf8b547cfc"}},"o2":{"numRecords":1200},"ts":{"$timestamp":{"t":1707390584,"i":7853}},"t":6,"v":2,"wall":{"$date":"2024      -02-08T11:09:44.531Z"}}}}
      
      [j0:s1:n2] 2024-02-08T11:09:48.354+00:00 F  REPL     [ReplWriterWorker-5] Error applying operation while recovering from stable checkpoint. This can lead to data corruption.{"oplogEntry":{"      oplogEntry":{"op":"i","ns":"test0_fsmdb0.tmpJhbuV.convertToCapped.convert_to_capped_collection_8","ui":{"$uuid":"aa48d5a1-e825-40f2-aad9-70b6c0d0c64f"},"o":{"_id":{"$oid":"65c4b647eedaadff2      2434576"},"i":1181,"rand":0.752134560170723},"o2":{"_id":{"$oid":"65c4b647eedaadff22434576"}},"fromMigrate":true,"ts":{"$timestamp":{"t":1707390584,"i":6304}},"t":6,"v":2,"wall":{"$date":"2      024-02-08T11:09:44.367Z"}},"isForCappedCollection":true},"error":"DuplicateKey{ keyPattern: { _id: 1 }, keyValue: { _id: ObjectId('65c4b647eedaadff22434576') }, foundValue: 1182, duplicateR      id: 1182 }: E11000 duplicate key error collection: test0_fsmdb0.tmpJhbuV.convertToCapped.convert_to_capped_collection_8 index: _id_ dup key: { _id: ObjectId('65c4b647eedaadff22434576') } fo      und value: RecordId(1182)"}
      62198 [j0:s1:n2] 2024-02-08T11:09:48.354+00:00 F  ASSERT   [ReplWriterWorker-5] Fatal assertion{"msgid":5415000,"file":"src/mongo/db/repl/oplog.cpp","line":1389}
      [j0:s1:n2] 2024-02-08T11:09:48.355+00:00 F  ASSERT   [ReplWriterWorker-5]
       
       ***aborting after fassert() failure
       
       [j0:s1:n2] 2024-02-08T11:09:48.355+00:00 F  CONTROL  [ReplWriterWorker-5] Writing fatal message{"message":"\n"}
      [j0:s1:n2] 2024-02-08T11:09:48.355+00:00 F  CONTROL  [ReplWriterWorker-5] Writing fatal message{"message":"Got signal: 6 (Aborted).\n"}
      
      [j0:s1:n2] 2024-02-08T11:09:48.446+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0BE70A980 funlockfile+0x50
      [j0:s1:n2] 2024-02-08T11:09:48.446+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0BE345E87 gsignal+0xC7
      [j0:s1:n2] 2024-02-08T11:09:48.446+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0BE3477F1 abort+0x141
      [j0:s1:n2] 2024-02-08T11:09:48.446+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0C091E4FB mongo::(anonymous namespace)::callAbort()+0x1B
      [j0:s1:n2] 2024-02-08T11:09:48.446+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0C091ED82 mongo::fassertFailedWithLocation(int, char const*, unsigned int)+0x102
      [j0:s1:n2] 2024-02-08T11:09:48.446+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0B9B9189F mongo::repl::OplogApplication::checkOnOplogFailureForRecovery(mongo::OperationContext*, mongo::NamespaceString const&, mongo::BSONObj const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x29F
      [j0:s1:n2] 2024-02-08T11:09:48.446+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0B9B956C6 mongo::repl::applyOperation_inlock(mongo::OperationContext*, mongo::CollectionAcquisition&, mongo::repl::OplogEntryOrGroupedInserts const&, bool, mongo::repl::OplogApplication::Mode, bool, std::function<void ()>)+0x38D6
      [j0:s1:n2] 2024-02-08T11:09:48.446+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0BB2F1A70 mongo::repl::OplogApplierUtils::applyOplogEntryOrGroupedInsertsCommon(mongo::OperationContext*, mongo::repl::OplogEntryOrGroupedInserts const&, mongo::repl::OplogApplication::Mode, bool, std::function<void ()>, mongo::OpCounters*)::$_1::operator()() const+0xAD0
      [j0:s1:n2] 2024-02-08T11:09:48.446+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0BB2EF17F mongo::repl::OplogApplierUtils::applyOplogEntryOrGroupedInsertsCommon(mongo::OperationContext*, mongo::repl::OplogEntryOrGroupedInserts const&, mongo::repl::OplogApplication::Mode, bool, std::function<void ()>, mongo::OpCounters*)+0x53F
      [j0:s1:n2] 2024-02-08T11:09:48.446+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0BB2DB91B mongo::repl::applyOplogEntryOrGroupedInserts(mongo::OperationContext*, mongo::repl::OplogEntryOrGroupedInserts const&, mongo::repl::OplogApplication::Mode, bool)+0x10B
      [j0:s1:n2] 2024-02-08T11:09:48.446+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0BB2EBC9E std::_Function_handler<mongo::Status(mongo::OperationContext*, mongo::repl::OplogEntryOrGroupedInserts const&, mongo::repl::OplogApplication::Mode, bool), mongo::Status (*)(mongo::OperationContext*, mongo::repl::OplogEntryOrGroupedInserts const&, mongo::repl::OplogApplication::Mode, bool)>::_M_invoke(std::_Any_data const&, mongo::OperationContext*&&, mongo::repl::OplogEntryOrGroupedInserts const&, mongo::repl::OplogApplication::Mode&&, bool&&)+0x1E
      [j0:s1:n2] 2024-02-08T11:09:48.446+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0BB2F01EF mongo::repl::OplogApplierUtils::applyOplogBatchCommon(mongo::OperationContext*, std::vector<mongo::repl::ApplierOperation, std::allocator<mongo::repl::ApplierOperation> >*, mongo::repl::OplogApplication::Mode, bool, bool, std::function<mongo::Status(mongo::OperationContext*, mongo::repl::OplogEntryOrGroupedInserts const&, mongo::repl::OplogApplication::Mode, bool)>)+0x24F
      [j0:s1:n2] 2024-02-08T11:09:48.446+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0BB2DC7B7 mongo::repl::OplogApplierImpl::applyOplogBatchPerWorker(mongo::OperationContext*, std::vector<mongo::repl::ApplierOperation, std::allocator<mongo::repl::ApplierOperation> >*, std::vector<mongo::MultikeyPathInfo, std::allocator<mongo::MultikeyPathInfo> >*, bool)+0xF7
      [j0:s1:n2] 2024-02-08T11:09:48.446+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0BB2DE959 auto mongo::unique_function<void (mongo::Status)>::makeImpl<mongo::repl::OplogApplierImpl::_applyOplogBatch(mongo::OperationContext*, std::vector<mongo::repl::OplogEntry, std::allocator<mongo::repl::OplogEntry> >)::$_2>(mongo::repl::OplogApplierImpl::_applyOplogBatch(mongo::OperationContext*, std::vector<mongo::repl::OplogEntry, std::allocator<mongo::repl::OplogEntry> >)::$_2&&)::SpecificImpl::call(mongo::Status&&)+0x119
      [j0:s1:n2] 2024-02-08T11:09:48.447+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0B3CF28F7 mongo::ThreadPool::Impl::_doOneTask(std::unique_lock<std::mutex>*)+0x1C7
      [j0:s1:n2] 2024-02-08T11:09:48.447+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0B3CF1CC7 mongo::ThreadPool::Impl::_consumeTasks()+0xC7
      [j0:s1:n2] 2024-02-08T11:09:48.447+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0B3CF1AB6 mongo::ThreadPool::Impl::_workerThreadBody(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1A6
      [j0:s1:n2] 2024-02-08T11:09:48.447+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0B3CF3085 std::thread::_State_impl<std::thread::_Invoker<std::tuple<mongo::stdx::thread::thread<mongo::ThreadPool::Impl::_startWorkerThread_inlock()::$_5, , 0>(&&)::{lambda()#1}> > >::_M_run()+0x65
      [j0:s1:n2] 2024-02-08T11:09:48.447+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0BFDD2EE4 execute_native_thread_routine+0x14
      [j0:s1:n2] 2024-02-08T11:09:48.447+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0BE6FF6DB start_thread+0xDB
      [j0:s1:n2] 2024-02-08T11:09:48.447+00:00 I  CONTROL  [ReplWriterWorker-5]   Frame: 0x7FA0BE42861F clone+0x3F
      

            Assignee:
            allison.easton@mongodb.com Allison Easton
            Reporter:
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: