[SERVER-26973] resync.js needs to wait 1 minute for blacklisted sync source to be removed from blacklist before giving up on initial sync Created: 09/Nov/16  Updated: 06/Dec/17  Resolved: 04/May/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.5.7

Type: Bug Priority: Major - P3
Reporter: Benety Goh Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2017-05-08
Participants:
Linked BF Score: 0

 Description   

the recovering node (in maintenance mode) will fassert if we attempt to resync while the only viable sync source is still blacklisted.



 Comments   
Comment by Githook User [ 04/May/17 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}

Message: SERVER-26973 Increase max number of attempts to find a sync source for initial sync in resync.js.
Branch: master
https://github.com/mongodb/mongo/commit/22ec067a8e1d3d2c0e9536400d4603d3552d9347

Comment by Spencer Brody (Inactive) [ 21/Apr/17 ]

I think the proper solution is to increase numInitialSyncConnectAttempts, so that it will continue trying to find a new sync source until the blacklist period ends.

Comment by Spencer Brody (Inactive) [ 21/Apr/17 ]

Actually, sleeping for a minute doesn't work. The problem is after a minute passes the node will try again to establish a sync source and again see that it's too stale and re-blacklist the node for another minute, ad infinitum.

Comment by Spencer Brody (Inactive) [ 21/Apr/17 ]

Interestingly, this test fails locally for me every time, and even increasing the number of initial sync retries to 3 didn't fix it.

Comment by Judah Schvimer [ 14/Apr/17 ]

benety.goh, Do we need to always wait 1 minute? This test rarely fails, it seems like a waste to make it 1 minute longer. Can we just increase "numInitialSyncConnectAttempts" so that it waits more time to find a sync source?

Generated at Thu Feb 08 04:13:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.