-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 3.0.7, 3.2.0
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
-
Sharding E (01/08/16), Sharding F (01/29/16)
The cache of the database/collection/chunk metadata and the cache of the registered shards are separate and independent from each other.
After a shard is completely removed (that is, first put into draining mode and then its chunks are moved off of it), any mongos instances other than the one on which removeShard was run will quickly refresh their shard caches and will note that the shard no longer exists. However, any chunk metadata, which references the just removed shard will not be invalidated.
Next time a query is run against one of these stale mongos instances, it will use the stale chunk information, but won't find the shard and will crash with the following stack (in 3.2):
s20006| 2015-12-14T15:39:48.165-0500 D SHARDING [conn1] found 1 shards listed on config server(s) s20006| 2015-12-14T15:39:48.168-0500 I CONTROL [conn1] *** unhandled exception (access violation) at 0x00007FF65E08DEE3, terminating s20006| 2015-12-14T15:39:48.168-0500 I CONTROL [conn1] *** access violation was a read from 0x0000000000000018 s20006| 2015-12-14T15:39:48.168-0500 I CONTROL [conn1] *** stack trace for unhandled exception: s20006| 2015-12-14T15:39:48.742-0500 I CONTROL [conn1] mongos.exe c:\program files (x86)\microsoft visual studio 12.0\vc\include\xstring(2159) std::basic_string<char,std::char_traits<char>,std::allocator<char> >::compare+0x53 s20006| 2015-12-14T15:39:48.742-0500 I CONTROL [conn1] mongos.exe c:\program files (x86)\microsoft visual studio 12.0\vc\include\xstring(2489) std::operator==<char,std::char_traits<char>,std::allocator<char> >+0x32 s20006| 2015-12-14T15:39:48.742-0500 I CONTROL [conn1] mongos.exe ...\src\mongo\s\client\shard.cpp(59) mongo::Shard::isConfig+0x32 s20006| 2015-12-14T15:39:48.743-0500 I CONTROL [conn1] mongos.exe ...\src\mongo\s\query\cluster_find.cpp(251) mongo::`anonymous namespace'::runQueryWithoutRetrying+0x640 s20006| 2015-12-14T15:39:48.743-0500 I CONTROL [conn1] mongos.exe ...\src\mongo\s\query\cluster_find.cpp(365) mongo::ClusterFind::runQuery+0x486 s20006| 2015-12-14T15:39:48.743-0500 I CONTROL [conn1] mongos.exe ...\src\mongo\s\commands\cluster_find_cmd.cpp(162) mongo::`anonymous namespace'::ClusterFindCmd::run+0x49b s20006| 2015-12-14T15:39:48.743-0500 I CONTROL [conn1] mongos.exe ...\src\mongo\s\s_only.cpp(128) mongo::Command::execCommandClientBasic+0x43b s20006| 2015-12-14T15:39:48.743-0500 I CONTROL [conn1] mongos.exe ...\src\mongo\s\s_only.cpp(171) mongo::Command::runAgainstRegistered+0x33d s20006| 2015-12-14T15:39:48.743-0500 I CONTROL [conn1] mongos.exe ...\src\mongo\s\strategy.cpp(237) mongo::Strategy::clientCommandOp+0x881 s20006| 2015-12-14T15:39:48.743-0500 I CONTROL [conn1] mongos.exe ...\src\mongo\s\request.cpp(110) mongo::Request::process+0x486 s20006| 2015-12-14T15:39:48.743-0500 I CONTROL [conn1] mongos.exe ...\src\mongo\s\server.cpp(141) mongo::ShardedMessageHandler::process+0xfb s20006| 2015-12-14T15:39:48.744-0500 I CONTROL [conn1] mongos.exe ...\src\mongo\util\net\message_server_port.cpp(231) mongo::PortMessageServer::handleIncomingMsg+0x509 s20006| 2015-12-14T15:39:48.744-0500 I CONTROL [conn1] mongos.exe c:\program files (x86)\microsoft visual studio 12.0\vc\include\functional(1150) std::_Bind<1,void * __ptr64,void * __ptr64 (__cdecl*const)(void * __ptr64),mongo::`anonymous namespace'::MessagingPortWithHandler * __ptr64>::_Do_call<,0>+0x6e s20006| 2015-12-14T15:39:48.744-0500 I CONTROL [conn1] mongos.exe c:\program files (x86)\microsoft visual studio 12.0\vc\include\functional(1138) std::_Bind<1,void * __ptr64,void * __ptr64 (__cdecl*const)(void * __ptr64),mongo::`anonymous namespace'::MessagingPortWithHandler * __ptr64>::operator()<>+0x56 s20006| 2015-12-14T15:39:48.744-0500 I CONTROL [conn1] mongos.exe c:\program files (x86)\microsoft visual studio 12.0\vc\include\functional(1150) std::_Bind<0,void,std::_Bind<1,void * __ptr64,void * __ptr64 (__cdecl*const)(void * __ptr64),mongo::`anonymous namespace'::MessagingPortWithHandler * __ptr64> >::_Do_call<>+0x35 s20006| 2015-12-14T15:39:48.744-0500 I CONTROL [conn1] mongos.exe c:\program files (x86)\microsoft visual studio 12.0\vc\include\functional(1138) std::_Bind<0,void,std::_Bind<1,void * __ptr64,void * __ptr64 (__cdecl*const)(void * __ptr64),mongo::`anonymous namespace'::MessagingPortWithHandler * __ptr64> >::operator()<>+0x56 s20006| 2015-12-14T15:39:48.744-0500 I CONTROL [conn1] mongos.exe c:\program files (x86)\microsoft visual studio 12.0\vc\include\thr\xthread(196) std::_LaunchPad<std::_Bind<0,void,std::_Bind<1,void * __ptr64,void * __ptr64 (__cdecl*const)(void * __ptr64),mongo::`anonymous namespace'::MessagingPortWithHandler * __ptr64> > >::_Run+0x51 s20006| 2015-12-14T15:39:48.744-0500 I CONTROL [conn1] mongos.exe c:\program files (x86)\microsoft visual studio 12.0\vc\include\thr\xthread(188) std::_LaunchPad<std::_Bind<0,void,std::_Bind<1,void * __ptr64,void * __ptr64 (__cdecl*const)(void * __ptr64),mongo::`anonymous namespace'::MessagingPortWithHandler * __ptr64> > >::_Go+0x28 s20006| 2015-12-14T15:39:48.744-0500 I CONTROL [conn1] MSVCP120D.dll std::_Pad::_Release+0xd9 s20006| 2015-12-14T15:39:48.754-0500 I CONTROL [conn1] MSVCR120D.dll beginthreadex+0x1f5 s20006| 2015-12-14T15:39:48.754-0500 I CONTROL [conn1] MSVCR120D.dll endthreadex+0x1d7 s20006| 2015-12-14T15:39:48.754-0500 I CONTROL [conn1] KERNEL32.DLL BaseThreadInitThunk+0x22
In 3.0, there is no crash, but all finds will start failing with the following error:
2015-12-12T04:57:28.872+0000 I - [conn31] Assertion: 13129:can't find shard for: red_7 .... mongos(_ZN5mongo15printStackTraceERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEE+0x39) [0x10e64c9b9] mongos(_ZN5mongo10logContextEPKc+0x100) [0x10e603280] mongos(_ZN5mongo11msgassertedEiPKc+0x13A) [0x10e5f07ba] mongos(_ZN5mongo11msgassertedEiRKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEE+0x1A) [0x10e5f067a] mongos(_ZN5mongo15StaticShardInfo13findWithRetryERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE+0x1A6) [0x10e599a56] mongos(_ZN5mongo15StaticShardInfo8findCopyERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE+0x21) [0x10e5963e1] mongos(_ZN5mongo5Shard5resetERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE+0x31) [0x10e5933f1] mongos(_ZN5mongo5Shard4makeERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE+0x9F) [0x10e4c320f] mongos(_ZN5mongo11dbgrid_cmds14RemoveShardCmd3runEPNS_16OperationContextERKNSt3__112basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEERNS_7BSONObjEiRSA_RNS_14BSONObjBuilderEb+0x80) [0x10e4ea500] mongos(_ZN5mongo7Command22execCommandClientBasicEPNS_16OperationContextEPS0_RNS_11ClientBasicEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x2FC) [0x10e590c7c] mongos(_ZN5mongo7Command20runAgainstRegisteredEPKcRNS_7BSONObjERNS_14BSONObjBuilderEi+0x123) [0x10e509523] mongos(_ZN5mongo8Strategy15clientCommandOpERNS_7RequestE+0x52C) [0x10e5a334c] mongos(_ZN5mongo7Request7processEi+0x4B2) [0x10e58fca2] mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x65) [0x10e1c72d5] mongos(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x33C) [0x10e60dabc] mongos(_ZN5boost12_GLOBAL__N_112thread_proxyEPv+0xB1) [0x10e67fa71] libsystem_pthread.dylib(_pthread_body+0x83) [0x7fff8e4ff05a] libsystem_pthread.dylib(_pthread_body+0x0) [0x7fff8e4fefd7] libsystem_pthread.dylib(thread_start+0xD) [0x7fff8e4fc3ed]
- is related to
-
SERVER-23878 Exclude remove3.js from sharding_legacy_multiversion
- Closed
- related to
-
SERVER-21527 Race between remove shard and access to database may cause NULL pointer crash
- Closed