[SERVER-40118] Allow users to initiate the replica set with the specified term (election id) Created: 14/Mar/19 Updated: 27/Jun/19 Resolved: 27/Jun/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Linda Qin | Assignee: | Mira Carey |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
Background In the situation when a shard is down & lost all its data, to recover this shard with minimal impact on the cluster, we would need to ensure the term (electionId)/configuration version for this recovered shard is the same/higher than the electionId/configuration version cached (for this shard) in the Replication Monitor on the mongos/CSRS/other shards. Otherwise the operation may fail with the following error (set XXX is the name of the down shard):
Issue Currently when we initiate the replica set configuration, we can specify the configuration version. However, there is no way to specify the (initial) term (electionId) for the replica set. As such, for the above issue, there are some workarounds:
As above, those workarounds are either not feasible, or requiring a lot of efforts. It would be nice if we can specify the term/electionId when initiating the replica set. |
| Comments |
| Comment by Ratika Gandhi [ 27/Jun/19 ] |
|
Will be addressed by |
| Comment by Linda Qin [ 17/May/19 ] |
Yes, that's correct. Targeted query that doesn't hit the down shard could still return the result.
Yup. I think this would be very nice. |
| Comment by Alyson Cabral (Inactive) [ 17/May/19 ] |
|
Thanks for your detailed response, Linda! To clarify, we can still access the data on other shards, we just can't return queries that include the down shard in the target, right? It seems to me like you'd rarely want to keep the meta data around for the lost data. That could lead a shard to believe it has much more data than it actually has in practice. Putting aside the in-memory problems for now, would it be best to have a way to force the shard to be removed and solve the allowPartialResults problem? |
| Comment by Linda Qin [ 17/May/19 ] |
|
This is could happen for any engine. But for the in-memory storage engine, it's harder to recover. For the other engines that persist the replica set configurations in the local database, we could hack the term in the local.replset.election collection to workaround this issue, but this is impossible for the in-memory storage engine.
This problem should not exist if we remove the shard and add a new shard. However, in the situation discussed here (a shard is down and lose all its data), it looks to me that it's impossible to remove this shard without starting it up (as this shard has the chunks on it and removing this shard would need to move all the chunks on it to the other shards. We could hack the config database to re-assign the chunks on this down shard to the other shards, but I assume that this would require to restart the whole cluster - for a sharded cluster with in-memory storage engine, this would mean losing all the data on the other shards too...). The benefit of keeping that shard that has no backups, is that the customer would still be able to access the data on the other shards, without the need to re-build the whole cluster (which includes shard the collections, pre-split the chunks etc). Also note that currently AllowPartialResults does not working when sharded cluster is backed by replica set ( |
| Comment by Andy Schwerin [ 16/May/19 ] |
|
I think it does have backups. This is about putting the shard back together from a backup. Edit Wait, that's a different ticket, I guess? This does seem to say "all data lost". I think there's a related case where you're restoring some stale data because it's all you've got. |
| Comment by Gregory McKeon (Inactive) [ 16/May/19 ] |
|
Does this problem exist if you remove the shard and add a new shard? What is the benefit of keeping a shard that has no backups? kevin.pulo |
| Comment by Kevin Pulo [ 23/Apr/19 ] |
|
Sharded clusters in general. The only requirement is that all members of the shard are irrevocably lost (with no backups), and the shard is then re-created from scratch with rs.initiate(). This situation is more likely with in-memory, but can happen for any engine. |
| Comment by Judah Schvimer [ 22/Apr/19 ] |
|
linda.qin, is this only possible with the in-memory storage engine, or sharded clusters in general? |