[SERVER-31885] changeStream cursor is not returned on a mongos when the database does not exist Created: 08/Nov/17 Updated: 30/Oct/23 Resolved: 05/Dec/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 3.6.0-rc3 |
| Fix Version/s: | 3.6.1, 3.7.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Shane Harvey | Assignee: | Bernard Gorman |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v3.6
|
||||||||||||||||||||
| Sprint: | Query 2017-12-18 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
On a replica set a changeStream cursor can be created on a collection where the database does not exist yet:
The same operation on a mongos does not return a cursor:
This is surprising when combined with the fact that wired tiger drops databases when there are no more collections. |
| Comments |
| Comment by Githook User [ 06/Dec/17 ] |
|
Author: {'name': 'Bernard Gorman', 'username': 'gormanb', 'email': 'bernard.gorman@gmail.com'}Message: (cherry picked from commit f9c698b67e6e08c05f4667d222a053f8f612d350) |
| Comment by Githook User [ 05/Dec/17 ] |
|
Author: {'username': 'gormanb', 'email': 'bernard.gorman@gmail.com', 'name': 'Bernard Gorman'}Message: |
| Comment by Spencer Brody (Inactive) [ 30/Nov/17 ] |
|
I think the behavior actually is the same for normal queries. For normal queries, if the collection or database doesn't exist returning no results with no cursor is the right behavior. |
| Comment by Bernard Gorman [ 30/Nov/17 ] |
|
spencer: I believe this behaviour difference exists for aggregation in general, rather than just for $changeStream. david.storch, am I right in thinking that we should change the behaviour on mongod to match that of mongos for all aggregations? |
| Comment by Alyson Cabral (Inactive) [ 28/Nov/17 ] |
|
Can we link a docs ticket to this? |
| Comment by Spencer Brody (Inactive) [ 28/Nov/17 ] |
|
Conclusion from discussion with kaloian.manassiev, david.storch, and schwerin is that we should change both mongos and mongod to affirmatively error if starting a change stream when the database doesn't exist, to keep feature parity between sharded and unsharded systems. This will be targeted for an early 3.6.x release, but not in time for 3.6.0. |
| Comment by Spencer Brody (Inactive) [ 13/Nov/17 ] |
|
I think we should be trying to move away from implicit creation rather than adding more of it. Also, in sharding there is an actual storage cost associated with creating a database, this could make a database name typo in opening the changeStream keep the database in the catalog indefinitely. Finally the access control story gets complicated if logically read-only operations now need to have the createDatabase privilege. So my vote would be to not make changeStreams implicitly create databases. The asymmetry here between mongos and mongod is definitely disappointing, however. I wonder if we should make changeStreams on a replica set error if the database doesn't yet exist, for the sake of consistency. This is a part of a larger problem related to the inconsistency between what a database represents in sharded and unsharded systems. Databases are much more real concepts in sharding where they have associated storage costs and need to be associated with a home shard, whereas on a mongod databases are just logical groupings of collections with no real data or metadata specific to them. |
| Comment by Charlie Swanson [ 13/Nov/17 ] |
|
It's certainly possible - I don't know what criteria we use for that decision, but I'd imagine it's historically been write operations implicitly create and reads don't? I'm not even sure if we have a list of those commands that do implicitly create a database - I know we've also historically had bugs such as Personally I think your proposal of having the change stream create the database would be fine. I do worry that some users are going to typo the database name and think they're getting changes when they actually aren't. I can't decide if that risk is worth the hassle of having everyone first type a create command before opening the stream. It sounds like you're more concerned with the ops headache and want to make it easy to do this though? |
| Comment by Alyson Cabral (Inactive) [ 13/Nov/17 ] |
|
Understood. Though, I believe it is important to provide a way to guarantee you see every change. However, it seems like if that database is created, you can still create the change stream first. Crazy idea, but would there be an issue with making opening a change stream one of those operations that creates the database? |
| Comment by Charlie Swanson [ 09/Nov/17 ] |
|
alyson.cabral You couldn't necessarily do it in that order, but if you create the collection with the 'create' command, then open the change stream, you'll be able to see all writes to the collection (assuming you didn't start writing between creating and opening the stream). Alternatively, any operation that creates the database would suffice, such as enabling sharding on the database, or inserting into another collection. For instance, this test runs the create command, then opens the change stream, then sees all subsequent writes: https://github.com/mongodb/mongo/blob/f19da233faba9a42b7fbe84b38df7bb7f1a9e496/jstests/sharding/change_streams_unsharded_becomes_sharded.js#L31-L38 |
| Comment by Alyson Cabral (Inactive) [ 09/Nov/17 ] |
|
What's the behavior when the collection is actually created? This is what I want our users to be able to do: I want there to be a way to see every change on a new collection without this awkward gap between collection creation and change stream creation. If this sharding behavior precludes that, this is important to me, otherwise it's much lower on the list. |
| Comment by Charlie Swanson [ 09/Nov/17 ] |
|
I'm not sure if this should be query or replication backlog, went with Replication since that's where most change stream tickets are these days. The code responsible is here - if the mongos doesn't know about a database it will return an empty result set for any aggregation, just as we do for any query on a non-existent database. I agree this is confusing, probably worth fixing in the case of change streams, but doesn't seem particularly high-priority. alyson.cabral any thoughts on priority? |