[SERVER-32210] Stepping down recipient shard's primary while migrating session information can fassert Created: 07/Dec/17  Updated: 30/Oct/23  Resolved: 28/Dec/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.6.4, 3.7.1

Type: Bug Priority: Major - P3
Reporter: Jack Mulrow Assignee: Andy Schwerin
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-42614 Remove error code 17405 and use Error... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Participants:

 Description   

When migrating session information, the recipient calls repl::logOp, which fasserts that the server can accept writes. If the recipient shard's primary steps down, this fassert can fail, crashing the server.

Example failure: https://evergreen.mongodb.com/task/mongodb_mongo_master_windows_64_2k8_debug_concurrency_sharded_with_stepdowns_and_balancer_WT_patch_b8f64cc3fde6d041f3e90b1cb2e153b0b15f6c47_5a20e4efe3c33173de00c68d_17_12_01_05_13_55

Backtrace:

d20013| 2017-12-01T05:48:06.246+0000 F REPL     [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] logOp() but can't accept write to collection db9.coll9
d20013| 2017-12-01T05:48:06.246+0000 F -        [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] Fatal Assertion 17405 at src\mongo\db\repl\oplog.cpp 392
d20013| 2017-12-01T05:48:06.246+0000 F -        [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala]
d20013|
d20013| ***aborting after fassert() failure
d20013|
d20013|
 
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    ...\src\mongo\util\stacktrace_windows.cpp(244)                                   mongo::printStackTrace+0x67x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    ...\src\mongo\util\signal_handlers_synchronous.cpp(182)                          mongo::`anonymous namespace'::printSignalAndBacktrace+0x115x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    ...\src\mongo\util\signal_handlers_synchronous.cpp(238)                          mongo::`anonymous namespace'::abruptQuit+0x131x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    d:\th\minkernel\crts\ucrt\src\appcrt\misc\signal.cpp(522)                        raise+0x1128x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    d:\th\minkernel\crts\ucrt\src\appcrt\startup\abort.cpp(71)                       abort+0x57x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    ...\src\mongo\util\assert_util.cpp(148)                                          mongo::fassertFailedWithLocation+0x393x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    ...\src\mongo\db\repl\oplog.cpp(392)                                             mongo::repl::_logOpsInner+0x244x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    ...\src\mongo\db\repl\oplog.cpp(449)                                             mongo::repl::logOp+0x698x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    ...\src\mongo\db\s\session_catalog_migration_destination.cpp(285)                <lambda_e4192caf70ab7ab2108a9fb64254bfc7>::operator()+0x489x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    ...\src\mongo\db\concurrency\write_conflict_exception.h(91)                      mongo::writeConflictRetry<<lambda_e4192caf70ab7ab2108a9fb64254bfc7> >+0x171x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    ...\src\mongo\db\s\session_catalog_migration_destination.cpp(315)                mongo::`anonymous namespace'::processSessionOplog+0x1979x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    ...\src\mongo\db\s\session_catalog_migration_destination.cpp(440)                mongo::SessionCatalogMigrationDestination::_retrieveSessionStateFromSource+0x2367x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    c:\program files (x86)\microsoft visual studio 14.0\vc\include\thr\xthread(247)  std::_LaunchPad<std::unique_ptr<std::tuple<std::_Binder<std::_Unforced,void (__cdecl mongo::SessionCatalogMigrationDestination::*)(mongo::ServiceContext * __ptr64) __ptr64,mongo::SessionCatalogMigrationDestination * __ptr64 const,mongo::ServiceContext * __ptr64 & __ptr64> >,std::default_delete<std::tuple<std::_Binder<std::_Unforced,void (__cdecl mongo::SessionCatalogMigrationDestination::*)(mongo::ServiceContext * __ptr64) __ptr64,mongo::SessionCatalogMigrationDestination * __ptr64 const,mongo::ServiceContext * __ptr64 & __ptr64> > > > >::_Run+0x115x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    c:\program files (x86)\microsoft visual studio 14.0\vc\include\thr\xthread(210)  std::_Pad::_Call_func+0x9x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    d:\th\minkernel\crts\ucrt\src\appcrt\startup\thread.cpp(92)                      invoke_thread_procedure+0x44x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe    d:\th\minkernel\crts\ucrt\src\appcrt\startup\thread.cpp(115)                     thread_start<unsigned int (__cdecl*)(void * __ptr64)>+0x147x
d20013| 2017-12-01T05:48:06.504+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] kernel32.dll                                                                                   BaseThreadInitThunk+0x13x
d20013| 2017-12-01T05:48:06.504+0000 F -        [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] Got signal: 22 (SIGABRT).
d20013| 2017-12-01T05:48:06.505+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] *** unhandled exception 0x0000000E at 0x000007FEFCDFA06D, terminating
d20013| 2017-12-01T05:48:06.505+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] *** stack trace for unhandled exception:
d20013| 2017-12-01T05:48:06.511+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] KERNELBASE.dll                                                                                   RaiseException+0x61x
d20013| 2017-12-01T05:48:06.511+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe      ...\src\mongo\util\signal_handlers_synchronous.cpp(239)                          mongo::`anonymous namespace'::abruptQuit+0x155x
d20013| 2017-12-01T05:48:06.511+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe      d:\th\minkernel\crts\ucrt\src\appcrt\misc\signal.cpp(522)                        raise+0x1128x
d20013| 2017-12-01T05:48:06.511+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe      d:\th\minkernel\crts\ucrt\src\appcrt\startup\abort.cpp(71)                       abort+0x57x
d20013| 2017-12-01T05:48:06.511+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe      ...\src\mongo\util\assert_util.cpp(148)                                          mongo::fassertFailedWithLocation+0x393x
d20013| 2017-12-01T05:48:06.511+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe      ...\src\mongo\db\repl\oplog.cpp(392)                                             mongo::repl::_logOpsInner+0x244x
d20013| 2017-12-01T05:48:06.512+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe      ...\src\mongo\db\repl\oplog.cpp(449)                                             mongo::repl::logOp+0x698x
d20013| 2017-12-01T05:48:06.512+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe      ...\src\mongo\db\s\session_catalog_migration_destination.cpp(285)                <lambda_e4192caf70ab7ab2108a9fb64254bfc7>::operator()+0x489x
d20013| 2017-12-01T05:48:06.512+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe      ...\src\mongo\db\concurrency\write_conflict_exception.h(91)                      mongo::writeConflictRetry<<lambda_e4192caf70ab7ab2108a9fb64254bfc7> >+0x171x
d20013| 2017-12-01T05:48:06.512+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe      ...\src\mongo\db\s\session_catalog_migration_destination.cpp(315)                mongo::`anonymous namespace'::processSessionOplog+0x1979x
d20013| 2017-12-01T05:48:06.512+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe      ...\src\mongo\db\s\session_catalog_migration_destination.cpp(440)                mongo::SessionCatalogMigrationDestination::_retrieveSessionStateFromSource+0x2367x
d20013| 2017-12-01T05:48:06.512+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe      c:\program files (x86)\microsoft visual studio 14.0\vc\include\thr\xthread(247)  std::_LaunchPad<std::unique_ptr<std::tuple<std::_Binder<std::_Unforced,void (__cdecl mongo::SessionCatalogMigrationDestination::*)(mongo::ServiceContext * __ptr64) __ptr64,mongo::SessionCatalogMigrationDestination * __ptr64 const,mongo::ServiceContext * __ptr64 & __ptr64> >,std::default_delete<std::tuple<std::_Binder<std::_Unforced,void (__cdecl mongo::SessionCatalogMigrationDestination::*)(mongo::ServiceContext * __ptr64) __ptr64,mongo::SessionCatalogMigrationDestination * __ptr64 const,mongo::ServiceContext * __ptr64 & __ptr64> > > > >::_Run+0x115x
d20013| 2017-12-01T05:48:06.512+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe      c:\program files (x86)\microsoft visual studio 14.0\vc\include\thr\xthread(210)  std::_Pad::_Call_func+0x9x
d20013| 2017-12-01T05:48:06.512+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe      d:\th\minkernel\crts\ucrt\src\appcrt\startup\thread.cpp(92)                      invoke_thread_procedure+0x44x
d20013| 2017-12-01T05:48:06.512+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] mongod.exe      d:\th\minkernel\crts\ucrt\src\appcrt\startup\thread.cpp(115)                     thread_start<unsigned int (__cdecl*)(void * __ptr64)>+0x147x
d20013| 2017-12-01T05:48:06.512+0000 I CONTROL  [sessionCatalogMigration-fsm_all_sharded_with_stepdowns_and_bala] kernel32.dll                                                                                     BaseThreadInitThunk+0x13x



 Comments   
Comment by Githook User [ 07/Mar/18 ]

Author:

{'email': 'schwerin@mongodb.com', 'name': 'Andy Schwerin', 'username': 'amschwerin'}

Message: SERVER-32210 logOp should uassert, not fassert when not primary.

(cherry picked from commit ee3d6a0f0a2f6dd23a772786326f35bfc864f7d4)
Branch: v3.6
https://github.com/mongodb/mongo/commit/8cdfd1829d9fe3601ede6268cf855cc284549236

Comment by Githook User [ 28/Dec/17 ]

Author:

{'username': 'amschwerin', 'email': 'schwerin@mongodb.com', 'name': 'Andy Schwerin'}

Message: SERVER-32210 logOp should uassert, not fassert when not primary.
Branch: master
https://github.com/mongodb/mongo/commit/ee3d6a0f0a2f6dd23a772786326f35bfc864f7d4

Comment by Andy Schwerin [ 07/Dec/17 ]

That fassert should be a uassert.

The caller shouldn't be calling repl::logOp directly, either. It should call OpObserver::onInternalOpMessage. Because all other callers of logOp are in op observers, and we know that op observers are allowed to throw exceptions, the only thing we need to validate is that the callers of _logOpsInner inside oplog.cpp are themselves exception safe.

Generated at Thu Feb 08 04:29:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.