Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12020

Removing and adding RS member fails with code 13144

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.4.8
    • Component/s: Replication
    • Labels:
    • ALL

      Remove a member and re-add it promptly. The first attempt to re-add fails, the second succeeds:

      rs:PRIMARY> rs.remove('localhost:27018')
      2013-12-09T17:30:05.241-0500 DBClientCursor::init call() failed
      2013-12-09T17:30:05.241-0500 Error: error doing query: failed at src/mongo/shell/query.js:81
      2013-12-09T17:30:05.243-0500 trying reconnect to 127.0.0.1:27017
      2013-12-09T17:30:05.243-0500 reconnect 127.0.0.1:27017 ok
      rs:PRIMARY> var config = rs.conf()
      rs:PRIMARY> config.members.push({_id: 1, host: 'localhost:27018'})
      2
      rs:PRIMARY> rs.reconfig(config)
      {
      	"errmsg" : "exception: need most members up to reconfigure, not ok : localhost:27018",
      	"code" : 13144,
      	"ok" : 0
      }
      rs:PRIMARY> rs.reconfig(config)
      { "ok" : 1 }
      

      The primary logs:

      replSet cmufcc requestHeartbeat localhost:27018 : 9001 socket exception [SEND_ERROR] server [127.0.0.1:27018]
      replSet replSetReconfig exception: need most members up to reconfigure, not ok : localhost:27018
      

      I think the offending code is in rs_initiate.cpp:98; it seems the primary thinks it still has a cached connection to the removed member, but the member closed its side of that connection when it was removed. The first attempt to use the old connection fails, and clears the cache. The second attempt creates a new connection and succeeds.

            Assignee:
            Unassigned Unassigned
            Reporter:
            jesse@mongodb.com A. Jesse Jiryu Davis
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: