[SERVER-41924] Rollback occured when higher-priority PRIMARY rejoined replica set after storage failure Created: 26/Jun/19 Updated: 16/Oct/21 Resolved: 04/Sep/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 4.0.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Alexander A | Assignee: | Danny Hatcher (Inactive) |
| Resolution: | Incomplete | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
Hello! We're testing MongoDB 4.0.10 failover capabilities and one of our test imitates block storage failure as described further. Replica set nodes: Test process:
This leads to primary failure and client (pymongo in our case) receives an exception:
3. Wait for the script finished. As soon as it received 1 exception and not designed to repeat failed writes, we must have 9999 documents written to replica set at this point, so check db.collection.count() to ensure. I repeated the test several times and noted that if I set equal priorities to all nodes the problem does not occur. When primary has priority 1 it rejoins as secondary and successfully replicates all 9999 documents. Can someone explain such behavior of replica set? Is it a bug? |
| Comments |
| Comment by Danny Hatcher (Inactive) [ 29/Jul/19 ] |
|
aanodin, have you had a chance to review my previous comment? |
| Comment by Danny Hatcher (Inactive) [ 05/Jul/19 ] |
|
I've taken a look at the log files but it's hard to piece together exactly what is happening. I see nodes other than db-2 marked as DOWN as well as config server issues. Does the problem reproduce if you remove the mongos/config servers from the cluster so that its simply a 5-member replica set? Regardless, could you please test again and provide the exact timestamps, logs, "diagnostic.data" folders, and script you are using to insert to the documents? |
| Comment by Alexander A [ 01/Jul/19 ] |
|
Hello Daniel, I've uploaded log files via the form you provided. Hope you can see them. An experiment took place at June 25. |
| Comment by Danny Hatcher (Inactive) [ 28/Jun/19 ] |
|
Hello, In order for us to investigate, please provide the full mongod log files from every node in the replica set. You can use our Secure Uploader which only MongoDB employees can access. |