[SERVER-38722] CollectionCloner should handle QueryPlanKilled on collection drop Created: 20/Dec/18 Updated: 29/Oct/23 Resolved: 28/Feb/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying, Replication |
| Affects Version/s: | 3.4.19, 3.6.10, 4.0.6 |
| Fix Version/s: | 3.6.12, 4.0.7 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | David Storch |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v4.0, v3.6, v3.4
|
||||||||||||||||||||
| Sprint: | Query 2019-02-25, Query 2019-03-11 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
The CollectionCloner, used in the first phase of initial sync, has code to interpret error codes from the getMore command as indicative of a collection drop: In 4.0 and older branches, it handles OperationFailed and CursorNotFound. However, collection drops can result in a getMore returning QueryPlanKilled. This error code should be handled as well. This was fixed in 4.1.7 by |
| Comments |
| Comment by Githook User [ 06/Mar/19 ] |
|
Author: {'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@10gen.com'}Message: |
| Comment by David Storch [ 28/Feb/19 ] |
|
The fix for this ticket will be included in the 4.0.7 release. This ensures that 4.0 nodes which are 4.0.7 and newer should be able to correctly tolerate collection drop when syncing from a 4.2 node. Note that in Users who may drop a collection during initial sync, and who may initial sync a 4.0 node from a 4.2 node during the 4.0 => 4.2 upgrade, should perform a minor version upgrade to at least 4.0.7 before attempting the major version upgrade to 4.2. I expect this situation to be unusual; most users should be able to upgrade directly to 4.2 from any 4.0 minor release. |
| Comment by Githook User [ 28/Feb/19 ] |
|
Author: {'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@10gen.com'}Message: |
| Comment by Judah Schvimer [ 27/Feb/19 ] |
|
Since |
| Comment by David Storch [ 27/Feb/19 ] |
|
judah.schvimer tess.avitabile, given that The fix for this ticket applies cleanly on 4.0 and 3.6, so I do plan to backport it with your approval to these newest two stable branches. |
| Comment by Tess Avitabile (Inactive) [ 02/Jan/19 ] |
|
I don't think this will backport cleanly to 3.4, since the check for OperationFailed and CursorNotFound was added inĀ |
| Comment by Judah Schvimer [ 02/Jan/19 ] |
|
The CollectionCloner code should be very similar in v3.4, so I'd vote for it if it's a clean backport as expected. |
| Comment by David Storch [ 20/Dec/18 ] |
|
tess.avitabile, yep, exactly. In fact, I think it might be the case that 4.0 raises QueryPlanKilled or CursorNotFound, but never OperationFailed. Thanks, I'll request backport, at least to 4.0. Should we backport even further back as well? |
| Comment by Tess Avitabile (Inactive) [ 20/Dec/18 ] |
|
Are you saying that it's possible for 4.0 to raise QueryPlanKilled if there is a collection drop? If so, then yes, we would be interested in a backport for the second piece. |
| Comment by David Storch [ 20/Dec/18 ] |
|
tess.avitabile siyuan.zhou, I believe there is a bug affecting 4.0. There are two pieces to this work:
Are you interested in a backport for the second piece? The first piece cannot be backported, so the CollectionCloner must continue to handle CursorNotFound and OperationFailed in older branches. |