[SERVER-39621] Disabled chaining should enforce sync source change when the primary steps down even if the oplog fetcher isn't killed on sync source Created: 15/Feb/19  Updated: 29/Oct/23  Resolved: 08/May/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.0.12
Fix Version/s: 4.4.1, 4.7.0, 4.2.16

Type: Improvement Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Samyukta Lanka
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Duplicate
is duplicated by SERVER-21537 chainingAllowed = false not being enf... Closed
is duplicated by SERVER-43615 syncSource diagnostics are incorrect Closed
Problem/Incident
Related
related to SERVER-44603 Consider having tailable readPreferen... Backlog
is related to SERVER-49708 Election with chaining disabled doesn... Closed
is related to SERVER-74425 Can not Select Primary when chaining ... Closed
is related to SERVER-58412 Changing settings.chainingEnabled in ... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.4, v4.2, v4.0, v3.6
Sprint: Repl 2020-05-18
Participants:
Case:
Linked BF Score: 6

 Description   

Since we no longer kill readers and close their connections on stepdown, the nodes syncing from the primary may not have a chance to choose a new sync source even if chaining is disabled.

SERVER-21537 has similar symptoms when the heartbeats are stale. Both of them can be addressed in the Smart Chaining project.



 Comments   
Comment by Githook User [ 04/Aug/21 ]

Author:

{'name': 'Samy Lanka', 'email': 'samy.lanka@mongodb.com', 'username': 'lankas'}

Message: SERVER-39621 Change sync source when primary steps down and chaining is disabled

(cherry picked from commit 2ffaa9d4efefffc7045b6b47d9380299b28dfd7a)
(cherry picked from commit 03db775aaf4e3167894092aa4bdfbb980b06c703)
Branch: v4.2
https://github.com/mongodb/mongo/commit/31194b8dd00e0862d2b0ebc5d6502360724e7297

Comment by Githook User [ 21/Aug/20 ]

Author:

{'name': 'Samy Lanka', 'email': 'samy.lanka@mongodb.com', 'username': 'lankas'}

Message: SERVER-39621 Change sync source when primary steps down and chaining is disabled

(cherry picked from commit 2ffaa9d4efefffc7045b6b47d9380299b28dfd7a)
Branch: v4.4
https://github.com/mongodb/mongo/commit/03db775aaf4e3167894092aa4bdfbb980b06c703

Comment by Siyuan Zhou [ 20/Aug/20 ]

evin.roesle, it should be similar to 4.4 backport. I'd say less than a day.

Comment by Evin Roesle [ 20/Aug/20 ]

siyuan.zhou Do you think there is any risk with a backport to 4.2 and what is the extra complexity? How much time do you estimate for this, less than a day or more?

Comment by Evin Roesle [ 14/May/20 ]

With being so close to GA, I do not think we should backport this ticket to 4.4 at this time

Comment by Tess Avitabile (Inactive) [ 13/May/20 ]

Sounds good, then I don't think we should backport to earlier versions.

evin.roesle, do you think we should backport to 4.4? This close to GA, we would need to get special permission from Kelsey.

Comment by Tess Avitabile (Inactive) [ 11/May/20 ]

evin.roesle, do you think this ticket should be backported to earlier branches?

samy.lanka, can you weigh in on the complexity of the backport?

Comment by Githook User [ 08/May/20 ]

Author:

{'name': 'Samy Lanka', 'email': 'samy.lanka@mongodb.com', 'username': 'lankas'}

Message: SERVER-39621 Change sync source when primary steps down and chaining is disabled
Branch: master
https://github.com/mongodb/mongo/commit/2ffaa9d4efefffc7045b6b47d9380299b28dfd7a

Comment by Judah Schvimer [ 26/Feb/19 ]

siyuan.zhou, how would the sync source know that the oplog read was being used for oplog fetching? Would it see that it's an internal connection, or just assume based on the OplogReplay flag? SERVER-37904 might make this harder as well by making chaining not necessarily a replica set wide configuration.

Comment by Kelsey Schubert [ 22/Feb/19 ]

This ticket would also help prior versions of MongoDB in cases where no active getmore was running against the primary when it stepped down.

Comment by Tess Avitabile (Inactive) [ 15/Feb/19 ]

I think the effect of the Avoid Closing Connections project on this change was small. We never closed connections between replica set members on stepdown, since these connections used hangUpOnStepDown:false. Additionally, we never killed cursors on stepdown. The only change is that if there was an active getMore on the sync source, it will no longer be killed after the Avoid Closing Connections project. Before this project, it was possible that a node would continue syncing from an old primary if there had not been an active getMore at the time of the stepdown.

Generated at Thu Feb 08 04:52:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.