[SERVER-31885] changeStream cursor is not returned on a mongos when the database does not exist Created: 08/Nov/17  Updated: 30/Oct/23  Resolved: 05/Dec/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.6.0-rc3
Fix Version/s: 3.6.1, 3.7.1

Type: Bug Priority: Major - P3
Reporter: Shane Harvey Assignee: Bernard Gorman
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Documented
is documented by DOCS-11098 Docs for SERVER-31885: changeStream c... Closed
Related
is related to PYTHON-1405 Test ChangeStreams on mongos Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Sprint: Query 2017-12-18
Participants:

 Description   

On a replica set a changeStream cursor can be created on a collection where the database does not exist yet:

> db.dropDatabase()
{ "ok" : 1, "operationTime" : Timestamp(1510184057, 1) }
> db.runCommand({"aggregate":"database-does-not-exist", "pipeline":[{'$changeStream': {}}], "cursor": {}})
{
	"cursor" : {
		"firstBatch" : [ ],
		"id" : NumberLong("5437951372965323737"),
		"ns" : "test.database-does-not-exist"
	},
	"ok" : 1,
	"operationTime" : Timestamp(1510184077, 1)
}

The same operation on a mongos does not return a cursor:

MongoDB Enterprise mongos> db.dropDatabase()
{
	"dropped" : "test",
	"ok" : 1,
	"$clusterTime" : {
		"clusterTime" : Timestamp(1510184413, 9),
		"signature" : {
			"hash" : BinData(0,"lFkT1OPozsQoRVqqU/5bXxUNhng="),
			"keyId" : NumberLong("6486191337619062789")
		}
	},
	"operationTime" : Timestamp(1510184413, 9)
}
MongoDB Enterprise mongos> db.runCommand({"aggregate":"database-does-not-exist", "pipeline":[{'$changeStream': {}}], "cursor": {}})
{
	"result" : [ ],
	"cursor" : {
		"id" : NumberLong(0),
		"ns" : "test.database-does-not-exist",
		"firstBatch" : [ ]
	},
	"ok" : 1,
	"$clusterTime" : {
		"clusterTime" : Timestamp(1510184417, 1),
		"signature" : {
			"hash" : BinData(0,"KsuOoKtQYUkl2peKWjMv4kJHzzE="),
			"keyId" : NumberLong("6486191337619062789")
		}
	},
	"operationTime" : Timestamp(1510184417, 1)
}

This is surprising when combined with the fact that wired tiger drops databases when there are no more collections.



 Comments   
Comment by Githook User [ 06/Dec/17 ]

Author:

{'name': 'Bernard Gorman', 'username': 'gormanb', 'email': 'bernard.gorman@gmail.com'}

Message: SERVER-31885 Prohibit $changeStream from running on a non-existent database

(cherry picked from commit f9c698b67e6e08c05f4667d222a053f8f612d350)
Branch: v3.6
https://github.com/mongodb/mongo/commit/e47886706c3dc93ec8b6df15d78fb0c1e91af47d

Comment by Githook User [ 05/Dec/17 ]

Author:

{'username': 'gormanb', 'email': 'bernard.gorman@gmail.com', 'name': 'Bernard Gorman'}

Message: SERVER-31885 Prohibit $changeStream from running on a non-existent database
Branch: master
https://github.com/mongodb/mongo/commit/f9c698b67e6e08c05f4667d222a053f8f612d350

Comment by Spencer Brody (Inactive) [ 30/Nov/17 ]

I think the behavior actually is the same for normal queries. For normal queries, if the collection or database doesn't exist returning no results with no cursor is the right behavior.

Comment by Bernard Gorman [ 30/Nov/17 ]

spencer: I believe this behaviour difference exists for aggregation in general, rather than just for $changeStream. david.storch, am I right in thinking that we should change the behaviour on mongod to match that of mongos for all aggregations?

Comment by Alyson Cabral (Inactive) [ 28/Nov/17 ]

Can we link a docs ticket to this?

Comment by Spencer Brody (Inactive) [ 28/Nov/17 ]

Conclusion from discussion with kaloian.manassiev, david.storch, and schwerin is that we should change both mongos and mongod to affirmatively error if starting a change stream when the database doesn't exist, to keep feature parity between sharded and unsharded systems. This will be targeted for an early 3.6.x release, but not in time for 3.6.0.

Comment by Spencer Brody (Inactive) [ 13/Nov/17 ]

I think we should be trying to move away from implicit creation rather than adding more of it. Also, in sharding there is an actual storage cost associated with creating a database, this could make a database name typo in opening the changeStream keep the database in the catalog indefinitely. Finally the access control story gets complicated if logically read-only operations now need to have the createDatabase privilege. So my vote would be to not make changeStreams implicitly create databases.

The asymmetry here between mongos and mongod is definitely disappointing, however. I wonder if we should make changeStreams on a replica set error if the database doesn't yet exist, for the sake of consistency. This is a part of a larger problem related to the inconsistency between what a database represents in sharded and unsharded systems. Databases are much more real concepts in sharding where they have associated storage costs and need to be associated with a home shard, whereas on a mongod databases are just logical groupings of collections with no real data or metadata specific to them.

kaloian.manassiev

Comment by Charlie Swanson [ 13/Nov/17 ]

It's certainly possible - I don't know what criteria we use for that decision, but I'd imagine it's historically been write operations implicitly create and reads don't? I'm not even sure if we have a list of those commands that do implicitly create a database - I know we've also historically had bugs such as SERVER-20852 where the behavior differs on mongos vs. mongod - my guess is that no one has ever given it much thought.

Personally I think your proposal of having the change stream create the database would be fine. I do worry that some users are going to typo the database name and think they're getting changes when they actually aren't. I can't decide if that risk is worth the hassle of having everyone first type a create command before opening the stream. It sounds like you're more concerned with the ops headache and want to make it easy to do this though?

Comment by Alyson Cabral (Inactive) [ 13/Nov/17 ]

Understood. Though, I believe it is important to provide a way to guarantee you see every change. However, it seems like if that database is created, you can still create the change stream first. Crazy idea, but would there be an issue with making opening a change stream one of those operations that creates the database?

Comment by Charlie Swanson [ 09/Nov/17 ]

alyson.cabral You couldn't necessarily do it in that order, but if you create the collection with the 'create' command, then open the change stream, you'll be able to see all writes to the collection (assuming you didn't start writing between creating and opening the stream). Alternatively, any operation that creates the database would suffice, such as enabling sharding on the database, or inserting into another collection.

For instance, this test runs the create command, then opens the change stream, then sees all subsequent writes: https://github.com/mongodb/mongo/blob/f19da233faba9a42b7fbe84b38df7bb7f1a9e496/jstests/sharding/change_streams_unsharded_becomes_sharded.js#L31-L38

Comment by Alyson Cabral (Inactive) [ 09/Nov/17 ]

What's the behavior when the collection is actually created?

This is what I want our users to be able to do:
1) open a client session that has causal consistency by default and within that client session
2) open a change stream
3) create a collection
4) start listening to changes

I want there to be a way to see every change on a new collection without this awkward gap between collection creation and change stream creation. If this sharding behavior precludes that, this is important to me, otherwise it's much lower on the list.

Comment by Charlie Swanson [ 09/Nov/17 ]

I'm not sure if this should be query or replication backlog, went with Replication since that's where most change stream tickets are these days. The code responsible is here - if the mongos doesn't know about a database it will return an empty result set for any aggregation, just as we do for any query on a non-existent database. I agree this is confusing, probably worth fixing in the case of change streams, but doesn't seem particularly high-priority.

alyson.cabral any thoughts on priority?

Generated at Thu Feb 08 04:28:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.