[SERVER-1770] Replica set fails to add member Created: 10/Sep/10  Updated: 30/Mar/12  Resolved: 10/Sep/10

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.6.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jozef Sovcik Assignee: Kyle Banker
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

a) 4 replica-set nodes running according to rs.status() are all healthy
(ip096=172.29.1.96, ip108=172.29.1.108, ip119=172.29.1.119, ip120=172.29.1.120)
b) primary is ip096 (attached log is taken from primary)
c) adding 5th node (ip121=172.29.1.121)
d) all using standard ports (27017)
e) ruby 1.9.2
f) mongo driver


Attachments: File add_rs_member.rb     Text File log_rsadd_using_ruby.txt     Text File log_rsadd_using_shell.txt    
Operating System: Linux
Participants:

 Description   

Hi, I'm trying to use Ruby to configure MongoDB replica set and seems that there is a bug while trying to eval rs.add.

When trying to use db.eval to run rs.add() from Ruby script I'm getting following error:
{"retval"=>

{"assertion"=>"need most members up to reconfigure, not ok : ip108", "assertionCode"=>13144, "errmsg"=>"db assertion failure", "ok"=>0.0}

, "ok"=>1.0}

Log file (primary) shows following error (full connection log is attached as "log_rsadd_using_ruby.txt"
Fri Sep 10 16:47:31 [conn71] admin Assertion failure !dbMutex.isWriteLocked() db/repl/heartbeat.cpp 115
Fri Sep 10 16:47:31 [conn71] replSet cmufcc requestHeartbeat ip096:27017 : 0 assertion db/repl/heartbeat.cpp:115
Fri Sep 10 16:47:31 [conn71] admin Assertion failure !dbMutex.isWriteLocked() db/repl/heartbeat.cpp 115
Fri Sep 10 16:47:31 [conn71] replSet cmufcc requestHeartbeat ip108 : 0 assertion db/repl/heartbeat.cpp:115
Fri Sep 10 16:47:31 [conn71] replSet replSetReconfig exception: need most members up to reconfigure, not ok : ip108

It worked when I tried to add member using mongo shell using the same rs.add(), so I'm sure all members were up.



 Comments   
Comment by Jozef Sovcik [ 10/Sep/10 ]

Great!

Comment by Kyle Banker [ 10/Sep/10 ]

Added.

http://www.mongodb.org/display/DOCS/Server-side+Code+Execution#Server-sideCodeExecution-Limitationsof%7B%7Beval%7D%7D

Comment by Jozef Sovcik [ 10/Sep/10 ]

I would anyway recommend either
a) to mention this at your wiki page
b) or fix the issue

So others will not waste time trying to find why rs.add does not work...
Maybe it would be good to mention with 'eval' or 'command' method, that some js functions may fail because of locking some objects.

Comment by Jozef Sovcik [ 10/Sep/10 ]

one more correction

config = connection['local']['system.replset'].find_one

Anyway, it works. Thanks a lot!
My compliments to your great support!

Based on this experience I wrote a method for Mongo::Connection class 'add_rs_member'.
You can find it here http://gist.github.com/574143

Comment by Kyle Banker [ 10/Sep/10 ]

First like should read

config = connection['local']['system.replSet'].find_one

Comment by Kyle Banker [ 10/Sep/10 ]

I believe the problem is that eval takes and write lock and that this conflicts with the requirements of reinitiating a replica set. Much better to do this manually like so:

config = connection['local']['system.replSet'].findOne
config['version'] += 1
config['members'] <<

{"_id": 4, "host": "ipaddress"}

connection['admin'].command({ :replSetReconfig => config})

Please report back if this works. Include the relevant Ruby file you've written if it fails.

Comment by Jozef Sovcik [ 10/Sep/10 ]

I have it all right here in front of me, so for next couple of hours I'm available to help.

Comment by Jozef Sovcik [ 10/Sep/10 ]

Forgot to mention: Linux distro: Debian Lenny 64bit

Comment by Eliot Horowitz (Inactive) [ 10/Sep/10 ]

Kyle can help figure what the root issue is here

Comment by Jozef Sovcik [ 10/Sep/10 ]

I have attached also ruby script I used for testing.

Generated at Thu Feb 08 02:57:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.