[SERVER-8028] mongos asserts, 100% CPU Created: 25/Dec/12  Updated: 15/Feb/13  Resolved: 25/Dec/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.2.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Aristarkh Zagorodnikov Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: assertion, mongos
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux 3.2 x64


Operating System: ALL
Participants:

 Description   

After reconfiguring one of the shard replica sets, mongos started failing with the following (once every 1-3 seconds) and consumed 100% CPU:

Tue Dec 25 17:43:56 [conn67701]   Assertion failure n == a src/mongo/s/shard.h 105
0x80e931 0x7d810d 0x6aa7d9 0x6c1d58 0x73071f 0x732724 0x769f88 0x76a7c5 0x76c415 0x771d59 0x76f560 0x770a52 0x56ce2e 0x58371c 0x589c
79 0x7815a2 0x75a99b 0x500751 0x7fcb51 0x7fb094207e9a
 /usr/bin/mongos(_ZN5mongo15printStackTraceERSo+0x21) [0x80e931]
 /usr/bin/mongos(_ZN5mongo12verifyFailedEPKcS1_j+0xfd) [0x7d810d]
 /usr/bin/mongos(_ZN5mongo17ChunkRangeManager12_insertRangeESt23_Rb_tree_const_iteratorISt4pairIKNS_7BSONObjEN5boost10shared_ptrIKNS
_5ChunkEEEEESB_+0x539) [0x6aa7d9]
 /usr/bin/mongos(_ZN5mongo12ChunkManager18loadExistingRangesERKSs+0x788) [0x6c1d58]
 /usr/bin/mongos(_ZN5mongo8DBConfig15getChunkManagerERKSsbb+0x51f) [0x73071f]
 /usr/bin/mongos(_ZN5mongo8DBConfig23getChunkManagerIfExistsERKSsbb+0x34) [0x732724]
 /usr/bin/mongos(_ZN5mongo17checkShardVersionEPNS_12DBClientBaseERKSsN5boost10shared_ptrIKNS_12ChunkManagerEEEbi+0x778) [0x769f88]
 /usr/bin/mongos(_ZN5mongo17checkShardVersionEPNS_12DBClientBaseERKSsN5boost10shared_ptrIKNS_12ChunkManagerEEEbi+0xfb5) [0x76a7c5]
 /usr/bin/mongos(_ZN5mongo14VersionManager19checkShardVersionCBEPNS_12DBClientBaseERKSsbi+0x35) [0x76c415]
 /usr/bin/mongos(_ZN5mongo17ClientConnections13checkVersionsERKSs+0x149) [0x771d59]
 /usr/bin/mongos(_ZN5mongo15ShardConnection5_initEv+0x2c0) [0x76f560]
 /usr/bin/mongos(_ZN5mongo15ShardConnectionC1ERKNS_5ShardERKSsN5boost10shared_ptrIKNS_12ChunkManagerEEE+0xa2) [0x770a52]
 /usr/bin/mongos(_ZN5mongo27ParallelSortClusteredCursor28setupVersionAndHandleSlaveOkEN5boost10shared_ptrINS_23ParallelConnectionSta
teEEERKNS_5ShardENS2_IS5_EERKNS_15NamespaceStringERKSsNS2_IKNS_12ChunkManagerEEE+0x2ee) [0x56ce2e]
 /usr/bin/mongos(_ZN5mongo27ParallelSortClusteredCursor9startInitEv+0xd3c) [0x58371c]
 /usr/bin/mongos(_ZN5mongo27ParallelSortClusteredCursor8fullInitEv+0x9) [0x589c79]
 /usr/bin/mongos(_ZN5mongo13ShardStrategy7queryOpERNS_7RequestE+0x472) [0x7815a2]
 /usr/bin/mongos(_ZN5mongo7Request7processEi+0x1fb) [0x75a99b]
 /usr/bin/mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x71) [0x500751]
 /usr/bin/mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x411) [0x7fcb51]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7fb094207e9a]

Restarting mongos fixed the problem.



 Comments   
Comment by Aristarkh Zagorodnikov [ 25/Dec/12 ]

Thanks, since it happens rarely and is caught by our monitoring it would be easier to restart until 2.2.3 arrives.

Comment by Eliot Horowitz (Inactive) [ 25/Dec/12 ]

This is because of SERVER-7704.
Until 2.2.3, comes out, you can:

  • use 2.2 nightly
  • bounce mongos when this happens

This is often caused by a replica set reconfig.

Comment by Aristarkh Zagorodnikov [ 25/Dec/12 ]

These failures are also followed by:

Tue Dec 25 17:50:24 [conn185834] warning: could not autosplit collection a.fs.chunks :: caused by :: 0 assertion src/mongo/s/shard.h:105

Generated at Thu Feb 08 03:16:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.