[SERVER-48675] Simple 4.4.0-rc7 repset test - restarted member not catching up on oplog [or count drift] Created: 09/Jun/20  Updated: 13/May/21  Resolved: 13/May/21

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.4.0-rc7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Paul Done Assignee: Bruce Lucas (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 20.04


Attachments: File SERVER-48675-440-rc11.tgz     PNG File SERVER-48675.png     File server-48765.tgz    
Issue Links:
Related
Operating System: ALL
Steps To Reproduce:

Follow this process: https://github.com/pkdone/MongoDB-AUTO-HA

In mointor.sh replace .countDocuments({}) with the old method that was being used: .count()

Increase the rate of document insertion by reducing the sleep in insert.py from 0.025s to 0.005s to make the count variance appear faster/more obvious

Ensure laptop/workstation is under load/strain and you may need to kill three or four primaries before witnessing the behaviour

Participants:

 Description   

Simple replicat set test of 3 mongod servers as part of one replica set  works fine on 4.2.6 and many other earlier versions of mongodb over the years.

Just tried on 4.4.0-rc7 and when a primary is killed and then restarted it does not seem to catch-up on the oplog.

To reproduce follow this process: https://github.com/pkdone/MongoDB-AUTO-HA

More info to follow below



 Comments   
Comment by Daniel Pasette (Inactive) [ 10/Jun/20 ]

Wonder if this is just a side effect of “replicate before journaling”

Comment by Paul Done [ 09/Jun/20 ]

As per last comment - probably works as intended

Comment by Bruce Lucas (Inactive) [ 09/Jun/20 ]

paul.done can you please attach log files and ftdc (diagnostic.data) for the whole replica set covering a test, along with a timeline - when did you start the test, when did you do the node restart?

Generated at Thu Feb 08 05:17:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.