[SERVER-41924] Rollback occured when higher-priority PRIMARY rejoined replica set after storage failure Created: 26/Jun/19  Updated: 16/Oct/21  Resolved: 04/Sep/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.0.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alexander A Assignee: Danny Hatcher (Inactive)
Resolution: Incomplete Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Hello!

We're testing MongoDB 4.0.10 failover capabilities and one of our test imitates block storage failure as described further.

Replica set nodes:
mongo-test-arb1 MONGOS
mongo-test-arb2 MONGOS
mongo-test-db1 ARBITER
mongo-test-db2 PRIMARY
mongo-test-db3 SECONDARY
mongo-test-db4 SECONDARY
mongo-test-db5 SECONDARY
All nodes have 1 vote and priority 1 except PRIMARY which has priority 10.

Test process:
1. Run a script writing 10000 documents to the replica set using 1 thread with w:majority. Writes go through one of mongos instances.
2. While script is running, run this command on mongo-test-db2 (PRIMARY):

echo 1 > /sys/block/sda/device/delete ; sleep 300 ; echo b > /proc/sysrq-trigger

This leads to primary failure and client (pymongo in our case) receives an exception:

Write results unavailable from mongo-test-db2:27018 :: caused by :: Connection closed by peer

3. Wait for the script finished. As soon as it received 1 exception and not designed to repeat failed writes, we must have 9999 documents written to replica set at this point, so check db.collection.count() to ensure.
4. Wait for primary restarted (300 seconds) and recheck db.collection.count(). For such configuration I usually get numbers about 9200-9300. It means that server rollbacks about 700 documents and I can see them in rollback directory in /var/lib.

I repeated the test several times and noted that if I set equal priorities to all nodes the problem does not occur. When primary has priority 1 it rejoins as secondary and successfully replicates all 9999 documents.

Can someone explain such behavior of replica set? Is it a bug?



 Comments   
Comment by Danny Hatcher (Inactive) [ 29/Jul/19 ]

aanodin, have you had a chance to review my previous comment?

Comment by Danny Hatcher (Inactive) [ 05/Jul/19 ]

I've taken a look at the log files but it's hard to piece together exactly what is happening. I see nodes other than db-2 marked as DOWN as well as config server issues. Does the problem reproduce if you remove the mongos/config servers from the cluster so that its simply a 5-member replica set? Regardless, could you please test again and provide the exact timestamps, logs, "diagnostic.data" folders, and script you are using to insert to the documents?

Comment by Alexander A [ 01/Jul/19 ]

Hello Daniel,

I've uploaded log files via the form you provided. Hope you can see them.

An experiment took place at June 25.

Comment by Danny Hatcher (Inactive) [ 28/Jun/19 ]

Hello,

In order for us to investigate, please provide the full mongod log files from every node in the replica set. You can use our Secure Uploader which only MongoDB employees can access.

Generated at Thu Feb 08 04:59:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.