Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-1922

need to restart mongod to clear stale shard mete-data

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 1.7.1
    • Component/s: None
    • Labels:
      None
    • ALL

      Problem:
      After dropping a shard and re-creating a shard with the same name, the following was seen in the logs

      Sun Oct 10 20:37:43 [conn173932] DBException in process: setShardVersion failed!

      { "errmsg" : "exception: gotShardHost different than what i had before before [set3/rs3a:27018] got [set3/rs3a:27018,rs3b:27018] ", "code" : 13299, "ok" : 0 }

      Reproduce:

      • turn the balancer off
        db.settings.update( { _id : "balancer" }

        ,

        Unknown macro: { $set }

        , true )

      • create a 2 member replset, "foo"
      • add the shard with a single member
        db.runCommand( { addshard : "foo/node1", maxSize: 409600, name : "shard1" }

        );

      • remove the shard
        db.runCommand( { removeshard : "foo/node1" }

        );

      • add the shard again, but with both nodes
        db.runCommand( { addshard : "foo/node1,node2", maxSize: 409600, name : "shard1" }

        );

      Workaround:
      Since the members of the shard were part of a replset, the following was performed to clear the error

      • find the current master (through looking at the rs.status()
      • for the current master, do a rs.stepDown()
      • restart that mongod process
      • repeat until all members of the replset had been re-started

      Business Case:

      • Reliability
        Need to deal with stale meta-data more gracefully and automatically

            Assignee:
            eliot Eliot Horowitz (Inactive)
            Reporter:
            alvin Alvin Richards (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: