Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40118

Allow users to initiate the replica set with the specified term (election id)

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None

      Description

      Background

      In the situation when a shard is down & lost all its data, to recover this shard with minimal impact on the cluster, we would need to ensure the term (electionId)/configuration version for this recovered shard is the same/higher than the electionId/configuration version cached (for this shard) in the Replication Monitor on the mongos/CSRS/other shards. Otherwise the operation may fail with the following error (set XXX is the name of the down shard):

      "Could not find host matching read preference { mode: \"primary\" } for set XXX"
      

      Issue

      Currently when we initiate the replica set configuration, we can specify the configuration version. However, there is no way to specify the (initial) term (electionId) for the replica set.

      As such, for the above issue, there are some workarounds:

      • One workaround is to shutdown the replica set and update the term in the local.replset.election collection, then restart the shard. However, for the shard with In-Memory storage engine, this is not feasible, as the data (including the local database) will be lost when the shard is restarted.
      • Another workaround is to restart the whole cluster. This is quite painful especially for large sharded clusters. Also, for sharded cluster that is using the In-memory storage engine, we can't just stop all the members in the cluster at the same time. Otherwise the data on those shards will be lost. So we would need to restart the mongos/CSRS/shard members in a rolling fashion. This would require a lot of efforts.
      • The other workaround is to step down the primary on the shard, until the new term (election id) matches the term before the shard was down. If the term for this shard was high before the shard was down, this workaround might not feasible.

      As above, those workarounds are either not feasible, or requiring a lot of efforts. It would be nice if we can specify the term/electionId when initiating the replica set.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jason.carey Jason Carey
              Reporter:
              linda.qin Linda Qin
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: