[SERVER-25966] Add initial sync unittests for metadata retries Created: 06/Sep/16  Updated: 05/Apr/17  Resolved: 29/Dec/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.5.2

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Benety Goh
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File initial_sync_retry_exhaust.log    
Issue Links:
Related
is related to SERVER-25702 add support to OplogFetcher for resta... Closed
is related to SERVER-23750 Make DataReplicator::initialSync the ... Closed
is related to SERVER-25874 Add serverParameter configuration for... Closed
is related to SERVER-27052 Add asynchronous operation support to... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2016-10-10, TIG 2016-11-21, Repl 2016-12-12, Repl 2017-02-13
Participants:

 Description   

Write tests that send failed responses and then successful responses for the following commands, checking that retries lead to a successful result:

  1. listIndexes
  2. find on a collection during a collection clone
  3. sync source selection
  4. find on the oplog when querying the upstream node for the last oplog entry
  5. listDatabases
  6. listCollections

Write tests for the OplogFetcher to ensure it retries and does not cause initial sync to fail on retryable errors

Write tests for the DataReplicator to test that larger errors lead to a restart of initial sync:

  1. Rollback
  2. Sync source changes
  3. Retry exhaust on any of the above retryable errors


 Comments   
Comment by Githook User [ 29/Dec/16 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-25966 added CollectionCloner tests for metadata retries
Branch: master
https://github.com/mongodb/mongo/commit/03510b1b55475fbbd8c710c70511c6397ca846e9

Comment by Benety Goh [ 27/Dec/16 ]

See this commit for the metadata retry changes:

https://github.com/mongodb/mongo/commit/eba32f352cffd1dbe8ca451bde5944b997bfebf5

Comment by Judah Schvimer [ 22/Nov/16 ]

I think that sync source change testing can be done in unittests. We will revisit this after SERVER-27052 (DataReplicator refactor) is complete.

Comment by Robert Guo (Inactive) [ 22/Nov/16 ]

judah.schvimer Giving this back to you for another look. I think this ticket can be closed given that we have testing of most things in the description (see my comment above for detail).

The only thing missing is sync source change causing initial sync to restart, which I believe can be more easily tested from JavaScript, possibly using the replSetSyncFrom command.

Comment by Robert Guo (Inactive) [ 16/Nov/16 ]

I was able to find existing unit tests for almost all of these scenarios.

  • Retries of various commands are handled by RemoteCommandRetryScheduler and tested in SchedulerShouldRetryUntilSuccessfulResponseIsReceived
  • OplogFetcher retry is tested in OplogFetcherResetsRestartCounterOnSuccessfulFetcherResponse and OplogFetcherAbortsWithOriginalResponseErrorOnFailureToScheduleNewFetcher
  • Rollback is tested in FailOnRollback

Retries are handled internally by RemoteCommandRetryScheduler, so there's no need to do additional testing of successful retries of individual commands. Failure after exhausting retries should be handled the same way by be bubbling up the failure to the caller of doInitialSync. I did a sanity check by failing each of the above commands (logs are attached to this ticket). In all cases, the failures correctly trigger an fassert on the initial sync node as expected without side-effects.

sync source selection has different logic to the other commands and could benefit from additional testing.

Generated at Thu Feb 08 04:10:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.