[SERVER-53026] Secondary cannot restart replication Created: 23/Nov/20  Updated: 29/Oct/23  Resolved: 02/Dec/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.0.22, 3.6.22, 4.2.12, 4.4.4

Type: Bug Priority: Major - P3
Reporter: A. Jesse Jiryu Davis Assignee: A. Jesse Jiryu Davis
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-33747 Arbiter tries to start data replicati... Closed
is related to SERVER-52680 Removed node on startup stuck in STAR... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2, v4.0
Sprint: Repl 2020-11-30, Repl 2020-12-14
Participants:
Linked BF Score: 0

 Description   

After SERVER-33747, initial_sync_document_validation.js times out on 3.6. The problem is that my SERVER-33747 change, ReplicationCoordinatorImpl::_startDataReplication exits early if it's ever been called before. The sequence _startDataReplication -> _stopDataReplication -> _startDataReplication therefore no longer actually restarts data replication.

This shows up in initial_sync_document_validation.js because this test calls the "resync" command. The "resync" command was removed after 3.6 (SERVER-31239) leaving ReplicationCoordinatorImpl::resyncData with one caller, replSetSyncFrom, the method was completely removed after 4.4 (SERVER-46831).

Does the replSetSyncFrom command suffer the same deadlock? In all versions?



 Comments   
Comment by Githook User [ 16/Dec/20 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}

Message: SERVER-53026 Fix "resync" command

(cherry picked from commit a574d23ec0b7d06b8d872bf64136308f541a796d)
(cherry picked from commit 68bf17aa3b19d0b7f53b7a1b6fe1ebbafdf558d2)
(cherry picked from commit 265e3c7d0d40457f0e8483d3ed4161ac3896d04a)
Branch: v4.4
https://github.com/mongodb/mongo/commit/703ba17679225f1d0e27e21ab0c84408fe3140da

Comment by Githook User [ 14/Dec/20 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}

Message: SERVER-53026 Fix "resync" command

(cherry picked from commit a574d23ec0b7d06b8d872bf64136308f541a796d)
(cherry picked from commit 68bf17aa3b19d0b7f53b7a1b6fe1ebbafdf558d2)
Branch: v4.2
https://github.com/mongodb/mongo/commit/265e3c7d0d40457f0e8483d3ed4161ac3896d04a

Comment by Githook User [ 13/Dec/20 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}

Message: SERVER-53026 Fix "resync" command

(cherry picked from commit a574d23ec0b7d06b8d872bf64136308f541a796d)
Branch: v4.0
https://github.com/mongodb/mongo/commit/68bf17aa3b19d0b7f53b7a1b6fe1ebbafdf558d2

Comment by Githook User [ 02/Dec/20 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}

Message: SERVER-53026 Fix "resync" command
Branch: v3.6
https://github.com/mongodb/mongo/commit/a574d23ec0b7d06b8d872bf64136308f541a796d

Comment by A. Jesse Jiryu Davis [ 02/Dec/20 ]

I'm fixing this on 3.6 first, then I'll investigate how much forward-porting is required.

Comment by A. Jesse Jiryu Davis [ 02/Dec/20 ]

I guess replSetSyncFrom doesn't suffer this deadlock, since it's used in many JS tests which are not failing. Only the JS tests that use the "resync" command have timeouts due to my SERVER-33747 change.

Generated at Thu Feb 08 05:29:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.