[SERVER-4481] Assertion failure in Replica Set IP address change Created: 13/Dec/11  Updated: 04/Apr/23  Resolved: 16/Dec/11

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.1
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Andrew Levy Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

centos 5.5 x86_64


Participants:

 Description   

I'm using hostnames, not IPs, for my replica sets. I recently upgraded a secondary server in a replica set, and the underlying IP address changed. I updated the hosts files on all my boxes, except I forgot to update the host file on the secondary itself (which had an entry for itself pointing to the old IP address). I added the secondary to the set before I caught my mistake, and it ended up crashing with this error:


Tue Dec 13 04:39:07 [rsMgr] replset msgReceivedNewConfig version: version: 13
Tue Dec 13 04:39:07 [rsMgr] replSet info saving a newer config version to local.system.replset
Tue Dec 13 04:39:07 [rsMgr] replSet saveConfigLocally done
Tue Dec 13 04:39:07 [rsMgr] self doesn't match: 3
Tue Dec 13 04:39:07 [rsMgr] Assertion failure false db/repl/rs.cpp 440
0x57eeb6 0x589d6b 0x7c214b 0x7c32f2 0x7c4080 0x7f5ec5 0x5939f3 0x591d25 0x591383 0x578d0f 0x57adc4 0xaa4560 0x2aaaaacce617 0x2aaaab748c2d
mongod(_ZN5mongo12sayDbContextEPKc+0x96) [0x57eeb6]
mongod(_ZN5mongo8assertedEPKcS1_j+0xfb) [0x589d6b]
mongod(_ZN5mongo11ReplSetImpl14initFromConfigERNS_13ReplSetConfigEb+0xadb) [0x7c214b]
mongod(_ZN5mongo7ReplSet13haveNewConfigERNS_13ReplSetConfigEb+0xd2) [0x7c32f2]
mongod(_ZN5mongo7Manager20msgReceivedNewConfigENS_7BSONObjE+0x2e0) [0x7c4080]
mongod(_ZN5boost6detail8function26void_function_obj_invoker0INS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo7ManagerENS7_7BSONObjEEENS3_5list2INS3_5valueIPS8_EENSC_IS9_EEEEEEvE6invokeERNS1_15function_bufferE+0x65) [0x7f5ec5]
mongod(_ZNK5boost9function0IvEclEv+0x243) [0x5939f3]
mongod(_ZN5mongo4task6Server6doWorkEv+0x225) [0x591d25]
mongod(_ZN5mongo4task4Task3runEv+0x33) [0x591383]
mongod(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xbf) [0x578d0f]
mongod(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0x57adc4]
mongod(thread_proxy+0x80) [0xaa4560]
/lib64/libpthread.so.0 [0x2aaaaacce617]
/lib64/libc.so.6(clone+0x6d) [0x2aaaab748c2d]
Tue Dec 13 04:39:07 [rsMgr] replSet error unexpected exception in haveNewConfig() : 0 assertion db/repl/rs.cpp:440
Tue Dec 13 04:39:07 [rsMgr] replSet error fatal, stopping replication

— (repeats)

When I noticed the host file error, I updated it correctly. I attempted to restart with the --repair flag but got repeated entries of this in the log:

Tue Dec 13 04:44:37 [initandlisten] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: local.oplog.rs top: { opid: 8, active: true, waitingForLock: false, secs_running: 0, op: "getmore", ns:
"local.oplog.rs", query: {}, client: "0.0.0.0:0", desc: "initandlisten", threadId: "0x2aaaab9cce00", numYields: 0 }

— (repeats)

I killed the mongod process again, restarted without the flag. This time it repaired and was successfully added back to the set.



 Comments   
Comment by Kristina Chodorow (Inactive) [ 13/Dec/11 ]

You should still repair, try restarting it:

  • with --repair
  • without --replSet and
  • on a different port.

Keep in mind that repair removes corruption, not fixes it, so you may end up with less on your secondary. You might want to resync it or restore from a backup.

In the future, you might want to turn on journaling, which will eliminate this problem altogether.

Generated at Thu Feb 08 03:06:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.