[SERVER-50063] Oplog fetcher can return network errors or CallbackCanceled when shutting down Created: 31/Jul/20  Updated: 29/Oct/23  Resolved: 10/Aug/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.4.1, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Lingzhi Deng Assignee: Lingzhi Deng
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Repl 2020-08-10, Repl 2020-08-24
Participants:
Linked BF Score: 0

 Description   

What happened was that the test OplogFetcherReturnsCallbackCanceledIfShutdownAfterRunQueryScheduled expects failpoint "hangAfterOplogFetcherCallbackScheduled" is always on to make sure oplogFetcher->shutdown() is run before the oplog fetcher tries to connect and init the cursor. But unfortunately, in getOplogFetcherAfterConnectionCreated, we disable the same failpoint, messing up with the expectation of the test.

As part of the oplog fetcher shutdown, we also force the DBClientConnection to close (shutdownAndDisallowReconnect). So if the above shutdown is run while the oplog fetcher is doing network operations, the oplog fetcher could fail with network errors instead of CallbackCanceled as expected in the unit test.

I am not sure if this is a "bug" and if the error code matters while the oplog fetcher is shutting down. But it might be a good idea to consolidate error code to CallbackCanceled in _finishCallback if _isShuttingDown().



 Comments   
Comment by Githook User [ 24/Aug/20 ]

Author:

{'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}

Message: SERVER-50063: Consolidate OplogFetcher return code to CallbackCanceled when shutting down

(cherry picked from commit a12299145c35b0cebea60f303613f85472b2915f)
Branch: v4.4
https://github.com/mongodb/mongo/commit/e161282063c7ab698982dba88e5514dc9faf0cc9

Comment by Githook User [ 10/Aug/20 ]

Author:

{'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}

Message: SERVER-50063: Consolidate OplogFetcher return code to CallbackCanceled when shutting down
Branch: master
https://github.com/mongodb/mongo/commit/a12299145c35b0cebea60f303613f85472b2915f

Comment by Lingzhi Deng [ 31/Jul/20 ]

We should either
1. Fix the test (either rewrite it to have the failpoint always set or allow network errors) OR
2. Consolidate error code to CallbackCanceled when shutting down.
CC samy.lanka

Generated at Thu Feb 08 05:21:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.