[SERVER-32029] ChangeStream resumeAfter does not work on unsharded collections if there is more than one shard in the system Created: 18/Nov/17 Updated: 30/Oct/23 Resolved: 27/Nov/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.0-rc7, 3.7.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Shane Harvey | Assignee: | Charlie Swanson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v3.6
|
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
Using $changeStreams with resumeAfter appears to be broken on monogs in 3.7.0-39-g4edcb81:
|
| Comments |
| Comment by Githook User [ 27/Nov/17 ] |
|
Author: {'name': 'Charlie Swanson', 'username': 'cswanson310', 'email': 'charlie.swanson@mongodb.com'}Message: (cherry picked from commit 64ec315a09f162772b7d0191a03267c23231a7e8) |
| Comment by Charlie Swanson [ 27/Nov/17 ] |
|
I'm resolving this issue, which has been re-purposed to only fix the resumeAfter issue for unsharded collections through mongos - the remaining work is being tracked by |
| Comment by Githook User [ 27/Nov/17 ] |
|
Author: {'name': 'Charlie Swanson', 'username': 'cswanson310', 'email': 'charlie.swanson@mongodb.com'}Message: |
| Comment by Charlie Swanson [ 22/Nov/17 ] |
|
Hi shane.harvey, Sorry for going silent for a while, but there are two issues at play here, I'll try to shed some light on them now. First, for any change stream against a mongos we need to keep some execution machinery executing on mongos in order to detect and retry significant events like a collection becoming sharded or a chunk migrating, etc. So, even for an unsharded collection we need to keep part of the change stream machinery executing on mongos. When executing against a sharded collection, we currently scatter the change stream to all shards, defensively preparing for chunk migrations (there is other work to optimize this planned in the future). However, in this (unsharded) case we expect to forward the oplog scanning portion of the change stream to only the primary shard for the database. This was not working correctly, and it turns out we forwarded the oplog scan portion to all shards mistakenly. This means that when we receive the change stream and try to resume on a shard that doesn't know about the collection (any shard that's not the primary), we error. It looks like we don't have any coverage of working with unsharded collections when there was more than one shard (oops!). This bug isn't so hard to fix, and should solve the resuming problem for unsharded collections. Separately, there is a similar problem for sharded collections. If a collection is sharded but not present on all shards, then some shards will not know about the collection, and will also mistakenly error upon resuming because of this. This bug is actually harder to fix, because it's hard to know whether the collection doesn't exist because it was dropped, or whether it doesn't exist because you don't own any chunks for it. I spent much of yesterday trying to fix both bugs, but the patch is getting large and cumbersome. I'm going to adjust course and try to get in a fix for the unsharded collections case ASAP, then split off another SERVER ticket for the bug concerning sharded collections. I'll also work to resolve that as quickly as possible, but it will likely take a little while to get all the pieces in place and through code review/evergreen, etc. |
| Comment by Charlie Swanson [ 20/Nov/17 ] |
|
Thanks for reporting shane.harvey. I can reproduce (the 'g' is not part of the githash, since of course it is not a hexadecimal digit, which was my error) - I've assigned to myself to investigate. I'm also surprised our tests didn't catch this. |
| Comment by Shane Harvey [ 20/Nov/17 ] |
|
I see the same error on 3.6.0-rc4-41-ge608b8b downloaded from https://evergreen.mongodb.com/task/mongodb_mongo_v3.6_enterprise_osx_1010_compile_e608b8b3490ac7a1bbb717411d6499c2d45b21f6_17_11_17_23_14_24. |
| Comment by Charlie Swanson [ 20/Nov/17 ] |
|
shane.harvey what commit is g4edcb81? I can't seem to find that in the history. Or if that's not a commit reference, ramon.fernandez can you give some insight as to where that version string comes from? |