[SERVER-7679] mongos crash while changing mongod master Created: 15/Nov/12  Updated: 15/Feb/13  Resolved: 04/Jan/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Anton V. Volokhov Assignee: Bill Hayward
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Hi!
I'm using mongodb 2.2.0 on ubuntu 12.04 x86_64.
cluster configuration is 2 shards with 2 mongod and 1 arbiter each.
Mongos suddenly crashed while changing mongod master.

here a slice of mongos log:
Wed Nov 14 11:49:50 [ReplicaSetMonitorWatcher] Primary for replica set mongodb-sh1 changed to mongodb03gt.load.net:27017
Wed Nov 14 11:49:51 [LockPinger] DBClientCursor::init call() failed
Wed Nov 14 11:49:51 [LockPinger] warning: distributed lock pinger 'mongodb01gt.load.net:27018/vi.load.net:27000:1351511864:1804289383' detected an exception while pinging. :: caused by :: DBClientBase::findN: transport error: mongodb01gt.load.net:27018 ns: admin.$cmd query: { getlasterror: 1, $auth: { local:

{ __system: 2 }

} }
Wed Nov 14 11:49:51 [conn13111996] warning: No primary detected for set mongodb-sh1
terminate called without an active exception
Wed Nov 14 11:49:51 [conn13111996] warning: db exception when initializing on mongodb-sh1:mongodb-sh1/mongodb01gt.load.net:27017,mongodb03gt.load.net:27017, current connection state is { state:

{ conn: "mongodb-sh1/mongodb01gt.load.net:27017,mongodb03gt.load.net:27017", vinfo: "mongodb-sh1:mongodb-sh1/mongodb01gt.load.net:27017,mongodb03gt.load.net:27017", cursor: "(none)", count: 0, done: false }

, retryNext: false, init: false, finish: false, errored: false } :: caused by :: 10009 ReplicaSetMonitor no master found for set: mongodb-sh1
Wed Nov 14 11:49:51 [conn13111996] trying reconnect to mongodb01gt.load.net:27017
Wed Nov 14 11:49:51 [mongosMain] connection accepted from 127.0.0.1:60007 #13169200 (13 connections now open)
Received signal 6
Backtrace: Wed Nov 14 11:49:51 [conn13111996] reconnect mongodb01gt.load.net:27017 failed couldn't connect to server mongodb01gt.load.net:27017
Wed Nov 14 11:49:51 [mongosMain] connection accepted from 127.0.0.1:60008 #13169201 (14 connections now open)
0x57b2a0 0x7fb65bf8e4c0 0x7fb65bf8e445 0x7fb65bf91bab 0x7fb65c8dc69d 0x7fb65c8da846 0x7fb65c8da873 0x7844ec Wed Nov 14 11:49:51 [conn13169200] end connection 127.0.0.1:60007 (12 connections now open)
0x7b1934 0x7b227b 0x68113b 0x7fb65cb2ce9a 0x7fb65c04a4bd
Wed Nov 14 11:49:51 [conn13169201] end connection 127.0.0.1:60008 (12 connections now open)
/usr/bin/mongos(_ZN5mongo17printStackAndExitEi+0x60)[0x57b2a0]
/lib/x86_64-linux-gnu/libc.so.6(+0x364c0)[0x7fb65bf8e4c0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7fb65bf8e445]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7fb65bf91bab]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(ZN9gnu_cxx27_verbose_terminate_handlerEv+0x11d)[0x7fb65c8dc69d]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5846)[0x7fb65c8da846]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5873)[0x7fb65c8da873]
/usr/bin/mongos(_ZN5mongo8Balancer3runEv+0x93c)[0x7844ec]
/usr/bin/mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xb4)[0x7b1934]
/usr/bin/mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x7b)[0x7b227b]
/usr/bin/mongos[0x68113b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7fb65cb2ce9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb65c04a4bd]
===



 Comments   
Comment by Anton V. Volokhov [ 06/Jan/13 ]

It really was a duplicate to SERVER-7061, upgrade to 2.2.1 fixed the issue. Thanks!

Comment by Ian Whalen (Inactive) [ 04/Jan/13 ]

Hi Anton, I'm closing this out since we haven't heard from you, but please reopen if upgrading to 2.2.1 didn't fix your issue.

Comment by Anton V. Volokhov [ 16/Nov/12 ]

Thanks! Hope, it will help.

Comment by Randolph Tan [ 15/Nov/12 ]

Hi,

This looks like a bug fixed by SERVER-7061. Can you try upgrading to 2.2.1 and sees if this fixes things?

Thanks!

Generated at Thu Feb 08 03:15:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.