[SERVER-49289] Support specifying a collection by its UUID to the aggregate command Created: 02/Jul/20 Updated: 29/Oct/23 Resolved: 26/Aug/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0 |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Max Hirschhorn | Assignee: | Jack Mulrow |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | PM-234-M2, PM-234-T-data-clone, query-work-resharding | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Sharding 2020-09-07 | ||||||||
| Participants: | |||||||||
| Description |
|
This allows the sender of the command to rely on NamespaceNotFound if the collection has been dropped, even if a collection with the same name was later re-created.
It should be fine for the AutoGetCollectionForRead in DocumentSourceCursor::loadBatch() to continue to using a NamespaceString rather than a UUID because the PlanExecutor is guaranteed to be killed (and for the aggregation pipeline to eventually error) if the collection is dropped while the cursor is being iterated. |
| Comments |
| Comment by Githook User [ 26/Aug/20 ] |
|
Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}Message: |
| Comment by Jack Mulrow [ 13/Aug/20 ] |
|
Got it. Routers do have each sharded collection's UUID in the routing table, but the UUID is a boost::optional which makes me worried we might not always have access to it. I'm guessing that's a holdover from upgrade/downgrade work when we added UUIDs though so it's probably safe to use them. And your idea for unsharded collections sounds good to me. |
| Comment by Max Hirschhorn [ 13/Aug/20 ] |
|
The aggregation pipeline for collection cloning requires merging the aggregations from the different donor shards together. Doesn't mongos (or a mongod acting as a router) know the collection UUID for any sharded collections? And so any collection UUID it doesn't know about it could assume are unsharded and route to the primary shard for the database? |
| Comment by Jack Mulrow [ 10/Aug/20 ] |
|
I think it does. Is it a hard requirement for the sender to act as a router in this case? From looking through ClusterAggregate there is a lot of logic that may need to change (like figuring out how to resolve a UUID on a router for targeting / getting the required privileges), so I imagine we could save some work by only changing single replica set aggregate to support a UUID if that's possible. Unless you're saying ClusterAggregate would accept a UUID only if a collection name was also included (as opposed to in place of a collection name), so it can still target by collection name but we can use the UUID on the targeted shards to verify the correct version of the collection still exists. I don't think that would be too much more work, assuming I'm not overlooking something. |
| Comment by Max Hirschhorn [ 10/Aug/20 ] |
I misspoke - the sender will be acting as a router the same way mongos does. jack.mulrow does that mean we'd want ClusterAggregate to accept a UUID in addition to a collection name? |
| Comment by Max Hirschhorn [ 21/Jul/20 ] |
|
Resharding is going to an aggregate command with a UUID directly on a replica set shard. That is, it won't go through mongos nor will the sender be acting as a router the same way mongos does. jack.mulrow, I don't think it is necessary for ClusterAggregate to accept a UUID in addition to a collection name. |
| Comment by Jack Mulrow [ 21/Jul/20 ] |
|
max.hirschhorn, is this ticket just for supporting specifying a collection by UUID for single replica set aggregate or does cluster aggregate need to support it as well? |