Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-59775

ReshardingDonorOplogIterator triggers an fassert() when it continues to run in member state SECONDARY following a stepdown

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.0.4, 5.1.0-rc0
    • Affects Version/s: 5.0.0
    • Component/s: Sharding
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v5.0
    • Sharding 2021-09-06, Sharding 2021-09-20
    • 1

      The design for PrimaryOnlyService has the cancellation token for the Instances canceled on stepdown and their task executor shut down. However, a currently running task can continue running (briefly) in member state SECONDARY. ReshardingDonorOplogIterator reads from the oplog buffer collection locally using the default RecoveryUnit::ReadSource of kNoTimestamp. This leads to the node hitting this fassert() in AutoGetCollectionForReadBase.

      Moreover, ReshardingDonorOplogIterator depends on being guaranteed to read the write committed by the ReshardingOplogFetcher thread after being notified via awaitInsert(). This means RecoveryUnit::ReadSource::kNoOverlap isn't a suitable alternative. Instead, we'll have ReshardingDonorOplogIterator use ShouldNotConflictWithSecondaryBatchApplicationBlock.


      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:22.607+00:00 I  REPL     21358   [ReplCoord-1] "Replica set state transition","attr":{"newState":"SECONDARY","oldState":"PRIMARY"}
      ...
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:22.608+00:00 F  STORAGE  4728700 [ReshardingRecipientService-1] "Reading from replicated collection on a secondary without read timestamp or PBWM lock","attr":{"collection":"config.localReshardingOplogBuffer.cea06672-2ba3-4d95-8b23-a4cfc596f4df.shard1"}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:22.608+00:00 F  ASSERT   23089   [ReshardingRecipientService-1] "Fatal assertion","attr":{"msgid":4728700,"file":"src/mongo/db/db_raii.cpp","line":334}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:22.608+00:00 F  ASSERT   23090   [ReshardingRecipientService-1] "\n\n***aborting after fassert() failure\n\n"
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:22.608+00:00 F  CONTROL  4757800 [ReshardingRecipientService-1] "Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}
      ...
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:22.614+00:00 I  REPL     5123007 [ReplCoord-1] "Interrupting PrimaryOnlyService due to stepDown","attr":{"service":"ReshardingRecipientService","numInstances":1,"numOperationContexts":3}
      ...
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"557020E99237","b":"55700D006000","o":"13E93237","s":"_ZN5mongo25fassertFailedWithLocationEiPKcj","s+":"D7"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F13DA0B","b":"55700D006000","o":"12137A0B","s":"_ZN5mongo28AutoGetCollectionForReadBaseINS_25AutoGetCollectionLockFreeENS_32AutoGetCollectionForReadLockFree13EmplaceHelperEEC1EPNS_16OperationContextERKS3_b","s+":"15FB"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F13F8D2","b":"55700D006000","o":"121398D2","s":"_ZN5boost15optional_detail13optional_baseIN5mongo28AutoGetCollectionForReadBaseINS2_25AutoGetCollectionLockFreeENS2_32AutoGetCollectionForReadLockFree13EmplaceHelperEEEE9constructIJRPNS2_16OperationContextERS6_RbEEEvNS_11optional_ns15in_place_init_tEDpOT_","s+":"42"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F135399","b":"55700D006000","o":"1212F399","s":"_ZN5mongo12_GLOBAL__N_138acquireCollectionAndConsistentSnapshotIZNS_32AutoGetCollectionForReadLockFreeC1EPNS_16OperationContextERKNS_21NamespaceStringOrUUIDENS_25AutoGetCollectionViewModeENS_6Date_tEE3$_1ZNS2_C1ES4_S7_S8_S9_E3$_2ZNS2_C1ES4_S7_S8_S9_E3$_3EEDaS4_bRNS_24CollectionCatalogStasherET_T0_T1_","s+":"199"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F134DD4","b":"55700D006000","o":"1212EDD4","s":"_ZN5mongo32AutoGetCollectionForReadLockFreeC1EPNS_16OperationContextERKNS_21NamespaceStringOrUUIDENS_25AutoGetCollectionViewModeENS_6Date_tE","s+":"1F4"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F13ECFF","b":"55700D006000","o":"12138CFF","s":"_ZN5mongo35AutoGetCollectionForReadCommandBaseINS_32AutoGetCollectionForReadLockFreeEEC2EPNS_16OperationContextERKNS_21NamespaceStringOrUUIDENS_25AutoGetCollectionViewModeENS_6Date_tENS_16AutoStatsTracker7LogModeE","s+":"4F"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F1402EF","b":"55700D006000","o":"1213A2EF","s":"_ZN5boost15optional_detail13optional_baseIN5mongo39AutoGetCollectionForReadCommandLockFreeEE9constructIJRPNS2_16OperationContextERKNS2_21NamespaceStringOrUUIDERNS2_25AutoGetCollectionViewModeERNS2_6Date_tERNS2_16AutoStatsTracker7LogModeEEEEvNS_11optional_ns15in_place_init_tEDpOT_","s+":"4F"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F1368FB","b":"55700D006000","o":"121308FB","s":"_ZN5mongo44AutoGetCollectionForReadCommandMaybeLockFreeC2EPNS_16OperationContextERKNS_21NamespaceStringOrUUIDENS_25AutoGetCollectionViewModeENS_6Date_tENS_16AutoStatsTracker7LogModeE","s+":"8B"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701E4493CF","b":"55700D006000","o":"114433CF","s":"_ZN5boost15optional_detail13optional_baseIN5mongo44AutoGetCollectionForReadCommandMaybeLockFreeEE9constructIJRPNS2_16OperationContextERKNS2_21NamespaceStringOrUUIDENS2_25AutoGetCollectionViewModeENS2_6Date_tENS2_16AutoStatsTracker7LogModeEEEEvNS_11optional_ns15in_place_init_tEDpOT_","s+":"4F"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701E43C9C7","b":"55700D006000","o":"114369C7","s":"_ZN5mongo28CommonMongodProcessInterface40attachCursorSourceToPipelineForLocalReadEPNS_8PipelineE","s+":"4F7"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701E58982B","b":"55700D006000","o":"1158382B","s":"_ZN5mongo17shardVersionRetryIZNS_19sharded_agg_helpers22attachCursorToPipelineEPNS_8PipelineENS_20ShardTargetingPolicyEN5boost8optionalINS_7BSONObjEEEE3$_4EEDaPNS_16OperationContextEPNS_12CatalogCacheENS_15NamespaceStringENS_10StringDataEOT_","s+":"37B"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701E5890BD","b":"55700D006000","o":"115830BD","s":"_ZN5mongo19sharded_agg_helpers22attachCursorToPipelineEPNS_8PipelineENS_20ShardTargetingPolicyEN5boost8optionalINS_7BSONObjEEE","s+":"53D"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701E3FA8DF","b":"55700D006000","o":"113F48DF","s":"_ZN5mongo27ShardServerProcessInterface28attachCursorSourceToPipelineEPNS_8PipelineENS_20ShardTargetingPolicyEN5boost8optionalINS_7BSONObjEEE","s+":"5F"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F488150","b":"55700D006000","o":"12482150","s":"_ZN5mongo20DocumentSourceLookUp13buildPipelineERKNS_8DocumentE","s+":"E90"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F48581B","b":"55700D006000","o":"1247F81B","s":"_ZN5mongo20DocumentSourceLookUp12unwindResultEv","s+":"AAB"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F483B56","b":"55700D006000","o":"1247DB56","s":"_ZN5mongo20DocumentSourceLookUp9doGetNextEv","s+":"F6"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701CFC28BC","b":"55700D006000","o":"FFBC8BC","s":"_ZN5mongo14DocumentSource7getNextEv","s+":"21C"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F53C6EE","b":"55700D006000","o":"125366EE","s":"_ZN5mongo8Pipeline7getNextEv","s+":"DE"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701D698F3C","b":"55700D006000","o":"10692F3C","s":"_ZN5mongo28ReshardingDonorOplogIterator10_fillBatchERNS_8PipelineE","s+":"AC"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701D699E2C","b":"55700D006000","o":"10693E2C","s":"_ZN5mongo28ReshardingDonorOplogIterator12getNextBatchESt10shared_ptrINS_8executor12TaskExecutorEENS_17CancellationTokenENS_33CancelableOperationContextFactoryE","s+":"54C"}}
      

      https://evergreen.mongodb.com/lobster/build/c6979c6e3c82b5fa2586cea47ff21636/test/612a1acec2ab686fd51b1f68#bookmarks=0%2C37457%2C37460%2C37506%2C37797%2C160261%2C160414&f~=100~d20022%5C%7C

            Assignee:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: