[SERVER-31264] CollectionCloner should ignore NamespaceNotFound errors Created: 26/Sep/17  Updated: 06/Dec/22  Resolved: 12/Dec/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Backlog - Replication Team
Resolution: Duplicate Votes: 0
Labels: initialSync, neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-31267 CollectionCloner fails if collection ... Closed
Related
is related to SERVER-33644 getMissingDoc in initial sync needs t... Closed
Assigned Teams:
Replication
Operating System: ALL
Participants:
Case:
Linked BF Score: 68

 Description   

If a collection is dropped on the sync source after we call listIndexes but before we start fetching, we will get a NamespaceNotFound error. We should ignore it rather than failing initial sync.



 Comments   
Comment by Matthew Russotto [ 19/Oct/19 ]

The original bug in the description went away when we did SERVER-31267 and there's even a test for it in jstests/replsets/initial_sync_drop_collection.js. The QueryPlannerKilled issue Dan Gottlieb talks about will go away with Resumable Initial Sync (after QueryPlannerKilled we will resume)

Comment by Judah Schvimer [ 17/Oct/19 ]

matthew.russotto, will this go away with Resumable Initial Sync?

Comment by Spencer Brody (Inactive) [ 01/Feb/18 ]

Hmmm... we could, but I worry that without SERVER-31695 or SERVER-32089, we're still going to have issues with renameCollection.

Comment by Max Hirschhorn [ 01/Feb/18 ]

spencer, is it possible to have this ticket picked up by a member of the Replication team in the next iteration? The goal of the TIG team making the changes in SERVER-31093 was based on a conversation with judah.schvimer to be able to assert that with the addition of UUIDs, processing renameCollection operations ought to "just work" without any retries. Needing to partially revert those changes in SERVER-33060 due to this issue and others is undesirable from a coverage perspective.

Comment by Daniel Gottlieb (Inactive) [ 26/Jan/18 ]

There's another manifestation of concurrent drops of collections with initial sync; when a drop occurs while the CollectionCloner is scanning the dropped collection. In that case, the clone fails with a QueryPlannerKilled error.

I think it may makes sense to fix both bugs in same ticket; testing one bug is fixed will find the other. But if I'm wrong about that, feel free to split it up.

Comment by Benety Goh [ 29/Sep/17 ]

This can also happen during the count command.

Generated at Thu Feb 08 04:26:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.