[SERVER-40707] Secondary couldn't signal OplogWaiters to advance the lastCommittedOpTime in Chained replication mode Created: 18/Apr/19 Updated: 06/Dec/22 Resolved: 16/May/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.4.4, 3.4.19, 3.4.20 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | FirstName lipengchong | Assignee: | Backlog - Replication Team |
| Resolution: | Won't Fix | Votes: | 1 |
| Labels: | SWNA | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Steps To Reproduce: | a sharding cluster ,three config is chaining. a->b->c we can easily get majorityOpTime < opTime in rs.isMaster().lastWrite of c。
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
A replSet has three members, aa/bb/cc. In this situation , aa->bb->cc , aa is primary and the sync source of bb , bb is the sync source of cc . aa advances the lastCommittedOpTime , and it signal OplogWaiters , so bb can advance his lastCommittedOpTime , but sometimes cc couldn't advance its lastCommittedOpTime.
the only way to set _localOplogCollection is in function getLocalOplogCollection,and called here.
but for secondary ,it returned early because of oplog Disabled. So because _localOplogCollection = NULL , when secondary update _lastCommittedOpTime , it cann't signal OplogWaiters. |
| Comments |
| Comment by Gregory McKeon (Inactive) [ 16/May/19 ] | |||
|
Given that this is fixed in 3.6, the recommended resolution is to upgrade to 3.6. As a workaround in 3.4, you can disable chaining on the config server replica set. | |||
| Comment by Siyuan Zhou [ 22/Apr/19 ] | |||
|
lpc, I saw the problem and agreed about your analysis. | |||
| Comment by FirstName lipengchong [ 20/Apr/19 ] | |||
|
Hi Siyuan, thanks for your response! First , I know about the process of the last majority committed OpTime . But you can consider this problem , for the chain A->B->C , A determine the last majority committed OpTime ,and node B learn it after received a getmore command from Node C. So Node C cann't learned the the last majority committed OpTime until the a new oplog or the getmore timeout ( 5 seconds). At that time ,we can get delay queries with read concern "majority" . The key problem is node C send a getmore command to node B and blocked, then node B update the last majority committed OpTime and it doesn't signal the getmore command right now. So isn't it a bug here ?
So in the foregoing scenario, node B can signal the getmore query of Node C, then Node c learn the last majority committed OpTime.Maybe the purpose of the foregoing changes in 3.6 is not fixing this problem, but in fact it fixed the problem. | |||
| Comment by Siyuan Zhou [ 19/Apr/19 ] | |||
|
lpc, thanks for filing the ticket!
This is expected and doesn't necessarily mean a bug, since Node C doesn't know it's the last to receive a write and it's always the primary's job to determine the last majority committed OpTime. Node C has to apply the operation (updating the "lastWrite"), report its progress to the primary via its sync source, then learn the the last majority committed OpTime from the primary via its sync source. Thus the delay is expected. The code you pointed out might be a reason to delay queries with read concern "majority" on a secondary at the end of a chain, but before diving into the code, I'm still not sure of the problem or unexpected behavior you observed. Could you please describe the unexpected behavior and your expectation? A reproduction would be very helpful. eric.sedor, re-assigning to you to help get a reproduction or a clarification of the unexpected behavior. This might be an issue we have fixed in 3.6, but we need to confirm the problem before evaluating the fix. Thanks, | |||
| Comment by Tess Avitabile (Inactive) [ 19/Apr/19 ] | |||
|
Apologies, I'm out sick today and on vacation next week. Judah is also on vacation today. Assigning to Andy to triage. |