[SERVER-31267] CollectionCloner fails if collection is dropped between getMore calls Created: 26/Sep/17  Updated: 30/Oct/23  Resolved: 28/Nov/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.6.2, 3.7.1

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Matthew Russotto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File initial_sync_drop_collection.js    
Issue Links:
Backports
Depends
depends on SERVER-31695 Support queries across collection ren... Backlog
is depended on by SERVER-31466 CollectionCloner fails when a collect... Closed
Duplicate
is duplicated by SERVER-31264 CollectionCloner should ignore Namesp... Closed
Related
related to SERVER-32136 initial_sync_drop_collection.js shoul... Closed
related to SERVER-32783 CollectionCloner::shutdown() should n... Closed
related to SERVER-32089 Support rename collections during ini... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6, v3.4
Sprint: Repl 2017-10-23, Repl 2017-11-13, Repl 2017-12-04
Participants:
Linked BF Score: 68

 Comments   
Comment by Githook User [ 03/Jan/18 ]

Author:

{'name': 'Matthew Russotto', 'username': 'mtrussotto', 'email': 'matthew.russotto@10gen.com'}

Message: SERVER-31267 CollectionCloner fails if collection is dropped between getMore calls

(cherry picked from commit 43ab81ebc79de03844e55ff92224bdfb69e050f1)
Branch: v3.6
https://github.com/mongodb/mongo/commit/be7e525b14dddf56fbec90190da60dc020abb0f9

Comment by Githook User [ 28/Nov/17 ]

Author:

{'name': 'Matthew Russotto', 'username': 'mtrussotto', 'email': 'matthew.russotto@10gen.com'}

Message: SERVER-31267 CollectionCloner fails if collection is dropped between getMore calls
Branch: master
https://github.com/mongodb/mongo/commit/43ab81ebc79de03844e55ff92224bdfb69e050f1

Comment by Matthew Russotto [ 24/Oct/17 ]

Attached test demonstrates bug; in a system with http://mongodbcr.appspot.com/157960001 it will fail with [1] != [2] when checking the secondary collection.

Comment by Matthew Russotto [ 24/Oct/17 ]

This actually isn't safe. The problem is we can't distinguish between a cursor dropped because the collection was renamed and a cursor dropped because the collection was dropped. In the rename case, we can get a sequence of events

1) Insert object in OldColl
2) Start initial sync
3) Rename collection to NewColl
4) Initial sync continues, ignores error
5) NewColl ends up empty

To make matters worse, this bug exists now; it's just a smaller window. If after listing the collections but before doing the find, the collection is renamed on the primary, we continue believing the collection is empty. (never mind, just a bug in my test)

Comment by Matthew Russotto [ 09/Oct/17 ]

If the collection is dropped and recreated between calls, the CollectionCloner instead fails with "CursorNotFound". When the cursor is killed we do know if it is because the collection is being dropped and we could add that to the ClientCursor and return CollectionDropped instead; not sure if it's worth it? Maybe harder than that because we usually release the cursor entirely in that case, so we don't have the state.

Comment by Judah Schvimer [ 09/Oct/17 ]

matthew.russotto, do you have a repro script to show it?

Comment by Matthew Russotto [ 09/Oct/17 ]

This does also occur in 3.4

Comment by Judah Schvimer [ 26/Sep/17 ]

We need to determine if this also occurs in 3.4.

Generated at Thu Feb 08 04:26:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.