[SERVER-31785] Use multiple shards in sharded collections passthroughs Created: 01/Nov/17  Updated: 30/Oct/23  Resolved: 19/Jan/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Querying, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.7.2

Type: Improvement Priority: Major - P3
Reporter: Charlie Swanson Assignee: Charlie Swanson
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-24308 Sharding suites should run on at leas... Closed
Related
related to SERVER-32667 Add aggregation passthrough through m... Closed
related to SERVER-32668 Add aggregation passthrough with shar... Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 2017-12-04, Query 2017-12-18, Query 2018-01-01, Query 2018-01-15, Query 2018-01-29
Participants:

 Description   

The sharded_collections_jscore_passthrough and aggregation_sharded_collections_passthrough implicitly shard all collections, but use resmoke.py's default sharded cluster configuration, which has only one shard.

We should increase the number of shards to 2 in order to stress the merging logic on mongos.



 Comments   
Comment by Charlie Swanson [ 19/Jan/18 ]

I'm going to resolve this issue, but there are still some outstanding issues related to (and discussed on) this ticket: SERVER-32668 and SERVER-32667. Please watch those tickets for updates on those passthrough suites.

Comment by Githook User [ 19/Jan/18 ]

Author:

{'name': 'Charlie Swanson', 'email': 'charlie.swanson@mongodb.com', 'username': 'cswanson310'}

Message: SERVER-31785 Use 2 shards in sharded jscore passthrough.
Branch: master
https://github.com/mongodb/mongo/commit/87c9442cc30d4101693bb8ccb6fd4509aa048558

Comment by Charlie Swanson [ 11/Jan/18 ]

esha.maharishi I'm sorry to say that I'm still not convinced. I don't agree with this point:

Assume an agg pipeline can accept a namespace in N positions. Rather than needing to handle (and test) mongos being stale in each of those N positions in each way it can be stale (chunk has moved; coll has become sharded; coll has been dropped and recreated as unsharded; coll has been dropped and recreated and sharded with a different shard key), agg code would only need to handle/test mongos being stale in one way in each position. This is because no matter which way it was stale, it will be refreshed and re-routed from the top.

The plan is to simply move the code and use the same code in each place. If we do that, then testing that each place can handle each type of staleness seems redundant. We would want to test that each place can handle staleness at all, but (as you mention) that'd be true no matter the implementation.

I do agree that it would help in making aggregate less special, but I don't find that quite motivating enough.

Also, I want to re-iterate the point that we're not adding any special logic here. We're just extracting and re-using the code that currently lives in cluster_aggregate. I would like to talk about whether we can simplify some of that though - we might be able to find some compromise.

Comment by Esha Maharishi (Inactive) [ 11/Jan/18 ]

(It might be the wrong place to discuss - I put it here since there was discussion about exercising different agg code paths in mongos).

The reason I suggest this now, is exactly because changes are currently being made to the agg path for sharded $lookup. It would be unfortunate to write a bunch of specialized retry logic for sharded $lookup, then throw it all out later (or be resistant to throwing it out later).

These are the benefits I think would come of it:

  • Assume an agg pipeline can accept a namespace in N positions. Rather than needing to handle (and test) mongos being stale in each of those N positions in each way it can be stale (chunk has moved; coll has become sharded; coll has been dropped and recreated as unsharded; coll has been dropped and recreated and sharded with a different shard key), agg code would only need to handle/test mongos being stale in one way in each position. This is because no matter which way it was stale, it will be refreshed and re-routed from the top.
  • Aggregation becomes one less "special" path from sharding's perspective: sharding does not need to remember how aggregation's retry logic differs from the write path's, which differs from find's, which differs from metadata commands and reads like count/distinct.

If the performance degradation I described for long-running commands doesn't seem likely to have a big effect in practice (because migrations an order of magnitude longer), then I would push a second time to consider it. But if you still disagree, I'll accept the decision.

charlie.swanson david.storch

Comment by David Storch [ 11/Jan/18 ]

+1 for Charlie's response. For 3.6 we probably should have just lot the exception propagate up to the next level, but for sharded $lookup we're going to want the custom stale shard version retry code in place.

Also, it feels like this discussion is happening on the wrong ticket?

Comment by Charlie Swanson [ 10/Jan/18 ]

esha.maharishi I hear your argument, and it is a compelling vision for the sharding protocol for commands. I want to push back here a little bit to argue that while this makes a lot of sense for the current version of the aggregate command, it makes less sense in a world with sharded $lookup. Once we implement that, any of potentially very many namespaces has the potential to trigger a StaleConfig error at some point during execution. This seems to make things more complicated, and it's arguably less confusing/magical to see the retrying at the site of the cursor establishment instead of all the way back at the top of the call stack.

Stepping back a bit, I don't strongly disagree to this approach but I don't really see any reason why now is the right time to do it. Yes, we are shifting some of that code around in the sharded $lookup project, but we aren't adding any new targeting logic. The plans thus far all involve re-using the existing implementation. If anything, Mathias's changes for PM-864 will make that code more legible.

I'd also like to mention that the query team already has long-term plans to consolidate the number of read commands, potentially implementing all of them in terms of aggregation. While that may not happen for a while, it would be an alternative approach for removing duplication in the shard versioning protocol.

Comment by Esha Maharishi (Inactive) [ 10/Jan/18 ]

And nicholas.zolnierz@mongodb.com

Comment by Esha Maharishi (Inactive) [ 10/Jan/18 ]

I'd like to put up for consideration removing the retry logic for StaleShardVersion from the aggregation code on mongos.

Some code paths on mongos, such as count and distinct, already follow the pattern I'm proposing:

                                     strategy.cpp::runCommand()
                                   /                          \ 
                         ClusterCountCmd::run()         DistinctCmd::run()
                                  \                           /
                                scatterGather() OR establishCursors()
                                                 |
                                        AsyncRequestsSender
                                                 |
                             NetworkInterfaceASIO::scheduleRemoteCommand()

The idea is that scatterGather() and establishCursors() (which are the same except scatterGather() is for non-cursor-generating commands, while establishCursors() is for cursor-generating-commands) will check each response that comes over the network for StaleShardVersion. If a response with StaleShardVersion is encountered, these functions will throw a StaleShardVersion exception.

The StaleShardVersion exception is caught at the top in strategy.cpp::runCommand(), where the routing table cache entry for the namespace is invalidated, and the command is re-run from the very beginning.

This removes the need for each command's code to handle StaleShardVersion at all (i.e., removes loops, which are often nested in confusing ways, particularly in the aggregation code today). It makes each command's code very streamlined: just route based on what's currently in the routing table cache. Do not worry if the cache is stale; if it is, the command will automatically be retried from the top with a refreshed cache.

The only performance drawback is if a StaleShardVersion is encountered late into a long-running command (e.g., for sharded $lookup), since the command will have to be restarted. This can happen if migrations are completing more quickly than the long-running command can finish - but I think in general, this is unlikely, because a migration takes minutes to hours, while a long-running command takes seconds to minutes.

The general pattern I'm proposing has been in place for years, but many code paths on mongos, like find and agg, don't follow it. This has historical reasons: 'find' used ParallelSortClusteredCursor pre-3.2, and 'agg' in pre-3.6. ParallelSortClusteredCursor caught StaleShardVersions, invalidated routing table cache entries, and retried requests internally. (This was already problematic, because it didn't allow agg code to split the pipeline differently on hearing StaleShardVersion; in 3.6, we made the agg code start catching StaleShardVersion, but we should have let the agg code pass it back up to strategy.cpp::runCommand()).

I propose to update the find and agg paths let StaleShardVersion exceptions propagate all the way up, rather than entangle even more shard versioning logic in them in 3.8 for things like sharded $lookup.

david.storch charlie.swanson schwerin

Comment by David Storch [ 02/Jan/18 ]

Thanks Max, that sounds good to me.

Do you know if there's any behavior difference for the "aggregate" command when sharding isn't enabled for the database?

There is still an "agg passthrough" code path that is used when sharding isn't enabled for the database.

Comment by Max Hirschhorn [ 29/Dec/17 ]

1. Does the plan I describe above for aggregation sharded passthroughs make sense to you?

Given that mongos has special logic for aggregation when targeting a single shard (i.e. Pipeline::unsplitFromSharded()), your proposal of having three variants of running the aggregation tests against a sharded cluster makes sense. Do you know if there's any behavior difference for the "aggregate" command when sharding isn't enabled for the database? We run db.adminCommand({enableSharding: "test"}) in sharding_jscore_passthrough.yml.

2. What's your opinion on whether we should maintain multiple jsCore passthroughs with sharded collections, as I describe above?

I'm not aware of commands such as "find" and "count" have special logic for when they target a single shard, so I think until we add anything like that it'd be fine to change sharded_collections_jscore_passthrough.yml to have >1 shards and not add any new test suites for running jsCore tests against a sharded cluster.

Comment by David Storch [ 28/Dec/17 ]

kyle.suarez charlie.swanson, as we discussed last week, I think the desirable end state for our aggregation testing is to have three sharding passthroughs:

  • Unsharded collections.
  • Sharded collection which resides on a single shard.
  • Sharded collection which resides on multiple shards.

These three cases can exercise different logic in the sharding path, so I think we have actually lost some coverage by converting aggregation_sharded_collections_passthrough to use multiple shards. On the other hand, I'm unsure as to whether it makes sense to have a variant of sharded_collections_jscore_passthrough that uses a single shard. I'm not sure if commands other than aggregation have special logic for the "sharded collection with single targeted shard" case.

max.hirschhorn, could you answer the following:

  1. Does the plan I describe above for aggregation sharded passthroughs make sense to you?
  2. What's your opinion on whether we should maintain multiple jsCore passthroughs with sharded collections, as I describe above?
Comment by Kyle Suarez [ 22/Dec/17 ]

Throwing this back to open because I'm going on vacation. The work left to do is to update the sharded_collections_jscore_passthrough to use multiple shards and fix up any tests that might break.

Comment by Githook User [ 14/Dec/17 ]

Author:

{'name': 'Kyle Suarez', 'email': 'kyle.suarez@mongodb.com', 'username': 'ksuarz'}

Message: SERVER-31785 use multiple shards in aggregation_sharded_collections_passthrough
Branch: master
https://github.com/mongodb/mongo/commit/47247293f18ea581954f6fcf4c0018b7828e3c3a

Comment by Charlie Swanson [ 01/Nov/17 ]

david.storch if someone has time for this in the coming weeks, I think this would be good to do soon to stress the new usages of ARM for agg.

Generated at Thu Feb 08 04:28:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.