[SERVER-49289] Support specifying a collection by its UUID to the aggregate command Created: 02/Jul/20  Updated: 29/Oct/23  Resolved: 26/Aug/20

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 4.7.0

Type: New Feature Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: PM-234-M2, PM-234-T-data-clone, query-work-resharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt Dependency
has to be done before SERVER-49785 Write and test aggregation pipeline f... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2020-09-07
Participants:

 Description   

This allows the sender of the command to rely on NamespaceNotFound if the collection has been dropped, even if a collection with the same name was later re-created.

It should be fine for the AutoGetCollectionForRead in DocumentSourceCursor::loadBatch() to continue to using a NamespaceString rather than a UUID because the PlanExecutor is guaranteed to be killed (and for the aggregation pipeline to eventually error) if the collection is dropped while the cursor is being iterated.



 Comments   
Comment by Githook User [ 26/Aug/20 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-49289 Add collectionUUID option to aggregate
Branch: master
https://github.com/mongodb/mongo/commit/81169c43da6fe06789bcec195909872943b85f53

Comment by Jack Mulrow [ 13/Aug/20 ]

Got it. Routers do have each sharded collection's UUID in the routing table, but the UUID is a boost::optional which makes me worried we might not always have access to it. I'm guessing that's a holdover from upgrade/downgrade work when we added UUIDs though so it's probably safe to use them. And your idea for unsharded collections sounds good to me.

Comment by Max Hirschhorn [ 13/Aug/20 ]

The aggregation pipeline for collection cloning requires merging the aggregations from the different donor shards together.

Doesn't mongos (or a mongod acting as a router) know the collection UUID for any sharded collections? And so any collection UUID it doesn't know about it could assume are unsharded and route to the primary shard for the database?

Comment by Jack Mulrow [ 10/Aug/20 ]

I think it does. Is it a hard requirement for the sender to act as a router in this case? From looking through ClusterAggregate there is a lot of logic that may need to change (like figuring out how to resolve a UUID on a router for targeting / getting the required privileges), so I imagine we could save some work by only changing single replica set aggregate to support a UUID if that's possible.

Unless you're saying ClusterAggregate would accept a UUID only if a collection name was also included (as opposed to in place of a collection name), so it can still target by collection name but we can use the UUID on the targeted shards to verify the correct version of the collection still exists. I don't think that would be too much more work, assuming I'm not overlooking something.

Comment by Max Hirschhorn [ 10/Aug/20 ]

That is, it won't go through mongos nor will the sender be acting as a router the same way mongos does.

I misspoke - the sender will be acting as a router the same way mongos does. jack.mulrow does that mean we'd want ClusterAggregate to accept a UUID in addition to a collection name?

Comment by Max Hirschhorn [ 21/Jul/20 ]

Resharding is going to an aggregate command with a UUID directly on a replica set shard. That is, it won't go through mongos nor will the sender be acting as a router the same way mongos does. jack.mulrow, I don't think it is necessary for ClusterAggregate to accept a UUID in addition to a collection name.

Comment by Jack Mulrow [ 21/Jul/20 ]

max.hirschhorn, is this ticket just for supporting specifying a collection by UUID for single replica set aggregate or does cluster aggregate need to support it as well?

Generated at Thu Feb 08 05:19:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.