[SERVER-42910] Oplog query with higher timestamp but lower term than the sync source shouldn't time out due to afterClusterTime Created: 20/Aug/19  Updated: 29/Oct/23  Resolved: 21/Aug/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.0.13, 4.2.1, 4.3.1

Type: Bug Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Siyuan Zhou
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-33812 First initial sync oplog read batch f... Closed
related to SERVER-35200 Speed up failure detection in the Opl... Closed
is related to SERVER-42219 Oplog buffer not always empty when pr... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2, v4.0
Sprint: Repl 2019-08-26
Participants:

 Description   

SERVER-33812 attach afterClusterTime to all oplog queries. A node with higher timestamp but lower term than the sync source should roll back due to an empty batch, e.g. the old primary has (ts: 9, term: 1), while the new primary has (ts: 8, term: 2). However, the oplog query failed with MaxTimeMSExpired added in SERVER-35200. I believe the query times out while waiting for afterClusterTime. In production, it's very likely the old primary will roll back when new writes arrive with even higher timestamp, maybe by the periodic no-op writer. However, it is still a liveness issue.

 

 



 Comments   
Comment by Githook User [ 26/Sep/19 ]

Author:

{'username': 'visualzhou', 'email': 'siyuan.zhou@mongodb.com', 'name': 'Siyuan Zhou'}

Message: SERVER-42910 Oplog query with higher timestamp but lower term than the sync source shouldn't time out due to afterClusterTime.

(cherry picked from commit f54709196711c63a429b71f47c584661286d675f)
Branch: v4.0
https://github.com/mongodb/mongo/commit/220255882c3047868b9ecd43a36de09b1ecbb5da

Comment by Githook User [ 05/Sep/19 ]

Author:

{'username': 'visualzhou', 'email': 'siyuan.zhou@mongodb.com', 'name': 'Siyuan Zhou'}

Message: SERVER-42910 Oplog query with higher timestamp but lower term than the sync source shouldn't time out due to afterClusterTime.

(cherry picked from commit f54709196711c63a429b71f47c584661286d675f)
Branch: v4.2
https://github.com/mongodb/mongo/commit/b198f2b3b2502a92f76ea491254b9b6f10dd38ff

Comment by Siyuan Zhou [ 23/Aug/19 ]

We need to backport this to 4.2 since SERVER-42219 needs to be backported and will uncover this issue.

Comment by Githook User [ 21/Aug/19 ]

Author:

{'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com', 'username': 'visualzhou'}

Message: SERVER-42910 Oplog query with higher timestamp but lower term than the sync source shouldn't time out due to afterClusterTime.
Branch: master
https://github.com/mongodb/mongo/commit/f54709196711c63a429b71f47c584661286d675f

Comment by Siyuan Zhou [ 20/Aug/19 ]

Good point! Agreed that we don't need to backport this liveness fix.

Comment by Eric Milkie [ 20/Aug/19 ]

I'm not sure it's necessary? I thought the effect of this problem was simply that choosing a sync source for just the old primary might be delayed until a write happens, and for such nodes that experience this delay, they are destined to roll back anyway.

Comment by Siyuan Zhou [ 20/Aug/19 ]

milkie, I think we should backport this everywhere SERVER-33812 has backported - up to 3.6.

Comment by Eric Milkie [ 20/Aug/19 ]

Definitely possible; the old primary can continue to accept writes in the old term for quite a while before it finally notices the term has changed.

Comment by Suganthi Mani [ 20/Aug/19 ]

siyuan.zhou 

the old primary has (ts: 9, term: 1), while the new primary has (ts: 8, term: 2).  

Is this even possible ? Or it's possible during primary drain mode?

Generated at Thu Feb 08 05:01:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.