[SERVER-43615] syncSource diagnostics are incorrect Created: 24/Sep/19  Updated: 06/Dec/22  Resolved: 30/Sep/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.0.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Danny Hatcher (Inactive) Assignee: Backlog - Replication Team
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2019-09-24 at 6.21.40 PM.png    
Issue Links:
Duplicate
duplicates SERVER-39621 Disabled chaining should enforce sync... Closed
Assigned Teams:
Replication
Operating System: ALL
Steps To Reproduce:

1. Start a 3 member replica set on 4.0.2 or later.
2. Reconfigure replication to not allow chaining.
3. Stepdown the Primary.
4. Observe the syncSourceId for the node that doesn't step up.

Participants:

 Description   

In 4.0.1, the syncSourceId for a node identifies the member of the replica set the node is syncing from. Starting in 4.0.2, the syncSourceId for a node doesn't change unless the node is either stepping down or stepping up itself.

Additionally, if chaining is disabled there is no logging explicitly stating the new sync source when there was in 4.0.1.

4.0.1:

2019-09-24T18:12:41.842-0400 I REPL     [replexec-2] Member localhost:27018 is now in state PRIMARY
2019-09-24T18:12:42.335-0400 I REPL     [rsBackgroundSync] chaining not allowed, choosing primary as sync source candidate: localhost:27018

4.0.2:

2019-09-24T18:04:18.868-0400 I REPL     [replexec-0] Member localhost:27018 is now in state PRIMARY



 Comments   
Comment by Siyuan Zhou [ 30/Sep/19 ]

Thanks for the explanation! Closing this as a dup.

Comment by Danny Hatcher (Inactive) [ 30/Sep/19 ]

My statement was for a situation in which I would have expected a Secondary node to change its sync source, a change in Primaries with chaining disabled, but didn't. However, as you linked, this does appear to be a duplicate of SERVER-39621. When stepping down the Primary the sync source for the Secondary doesn't change but when killing the Primary process it does. I'm comfortable closing this as a dupe of that.

Comment by Siyuan Zhou [ 30/Sep/19 ]

daniel.hatcher, I'd like to clarify your observation a little before the investigation.

Starting in 4.0.2, the syncSourceId for a node doesn't change unless the node is either stepping down or stepping up itself.

In 4.0.2, we always return the sync source id if the node has a sync source or -1. It sounds reasonable to start choosing a new sync source on stepdown and reset the sync source ID to -1 on stepup, so the statement doesn't sound a bug to me if the node doesn't need to change sync source on steady state replication.

The symptoms about chaining you described sounds similar to SERVER-21537 and SERVER-39621.

Generated at Thu Feb 08 05:03:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.