-
Type: Improvement
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.4.3
-
Component/s: Replication
-
None
Consider having a replica set of 3 machines: dd-db1, dd-db2, dd-db3 with assigned priorities of 1.5, 1.0 and 0.7 correspondingly. dd-db1 is primary, other two are secondaries. Now, restart dd-db1 and dd-ddb3, the primary will shift (according to priorities) to dd-db2, which is correct. Now, restart dd-db2. It will also lose it's primary state, which is also correct. But now, if writes were coming at a steady pace, the oplog of dd-db2 would be several operations ahead of dd-db1 and dd-db3. This leads to replica set not getting a primary, since while dd-db2 is freshest and should become primary, the dd-db1 is up and has higher priority. I understand that it's a hard choice – either ignore priorities in favor of freshness or ignore freshness (and possibly cause rollbacks leading to a likely data loss) and favor priorities. I still think both of these solutions are better than leaving a replica set in the infinite "no primary" state. By the way, temporarily shutting down the higher-priority server helps, the freshest server becomes primary and the restarted higher-priority server just catches up and becomes primary again after a new election.
P.S. We've seen this with 2.2 also, moved to 2.4 but it appears to still ocur.
- duplicates
-
SERVER-9934 Slow failovers after step down due to sleeping rsBackgroundSync
- Closed
- is related to
-
SERVER-10621 Replication failover hangs indefinitely when priorities conflict with replication
- Closed