[SERVER-36482] Add mapping of namespace to ChunkVersion for versioned commands involving multiple collections Created: 07/Aug/18 Updated: 28/Aug/18 Resolved: 28/Aug/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Nicholas Zolnierz | Assignee: | Kyle Suarez |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Sprint: | Query 2018-09-10 | ||||||||
| Participants: | |||||||||
| Description |
|
This becomes very relevant for $out to a sharded collection as well as sharded $lookup. Currently the cluster aggregate command will attach a single version based on the state of the execution namespace, however this may be incorrect if one of the shards wishes grab a collection lock on one of the other involved namespaces. The title describes one possible approach, though I'm not sure how the implementation would look since it may involve modifying the operation context. Another option would be to have the shards temporarily modify the version on the op context when accessing a foreign collection. Edit: Looks like such a map exists already, so the problem is that the involved namespaces do not have entries and thus default to unsharded. |
| Comments |
| Comment by Kyle Suarez [ 28/Aug/18 ] | |||||
|
We're not going to do this. Our consensus is that a sharded aggregation that touches multiple namespaces will do shard versioning individually for each involved namespace. The infrastructure in the $out project will be tolerant to shard version changes such that writes will always be correctly routed to the shards that own them. A related aside from our discussion is that it's important that sharded aggregations clean up cursors after failure; that work will be tracked in | |||||
| Comment by Kyle Suarez [ 13/Aug/18 ] | |||||
|
kaloian.manassiev, esha.maharishi, nicholas.zolnierz, I put something on the calendar so we can discuss in person, and then I'll summarize the results of our meeting here. | |||||
| Comment by Kaloian Manassiev [ 11/Aug/18 ] | |||||
|
Having multiple collection shardVersions being sent implies some kind of atomicity is required, as in "I am expecting these two collections on this shard to be exactly at these versions". Since distributed $lookup is in a sense a two-phase operation - first check the main collection's version and then check the looked-up collection's version, this doesn't look like an atomic operation to me. kyle.suarez, can you schedule a meeting next week to go over the scenarios for $lookup and see whether multiple versions can be avoided? I would also like to go over the usages of isSharded() that nicholas.zolnierz pointed out in this comment. | |||||
| Comment by Esha Maharishi (Inactive) [ 08/Aug/18 ] | |||||
|
kyle.suarez, the shard definitely might not have a fresh routing table for "bar", in much the same way a mongos might not have a fresh routing table when routing. The shard should attach the shardVersion from its CatalogCache when sending remote requests on "bar", so that the shard can be alerted that its routing table for "bar" was stale and refresh it. | |||||
| Comment by Kyle Suarez [ 08/Aug/18 ] | |||||
I believe there's two things we want: it's important that "bar" is allowed to be sharded and that we know that we're going to have an up-to-date routing table with regard to which shards own which chunks for "bar". | |||||
| Comment by Esha Maharishi (Inactive) [ 08/Aug/18 ] | |||||
|
Yeah. Here is my brain dump for when you all triage this ticket:
shardVersion can be sent for two reasons: 1) To ensure the sender targeted the right set of shards 2) To ensure a collection was unsharded, as the sender believed.
Today, a shardVersion sent in a network message looks like this:
This is interpreted by the shard as: 1) the sender's believes shardVersion for the "main" namespace for the command (e.g., the value of the first key in the cmdObj) is 5. 2) the sender expects all other namespaces accessed by the command to be unsharded. Until 4.0, since both $out and $lookup always took CollectionLocks that did shardVersion checks, both $out and $lookup were guaranteed to only run on unsharded collections.
I think this ticket is requesting mongos to be able to send something like:
and for the shard to interpret this message to mean: 1) the sender believes the shardVersion for "foo" is 5 and for "bar" is 6 2) the sender expects all other namespaces accessed besides "foo" and "bar" to be unsharded.
However, is it really important that the sender believes the shardVersion for "bar" is 6? Or is it simply important that "bar" is allowed to be sharded? If the latter, the shard can access "bar" under a collection lock that does not do a shardVersion check. | |||||
| Comment by Nicholas Zolnierz [ 08/Aug/18 ] | |||||
|
Yeah there's likely no problem for $out at the moment, but it still feels fragile in places where we use the auto getter which checks the shard version such as here and here. I know the first spot is a bug which I have fixed in a follow-up, but I'm less certain about the second which is in a method used by $lookup. | |||||
| Comment by Kyle Suarez [ 08/Aug/18 ] | |||||
|
I agree with the assessment that it's not a strict requirement for $out. I do think that it will be a problem for $lookup, though, because in a sharded $lookup world there will be any number of sharded or unsharded ("single chunk") collections in the shards part of the pipeline. | |||||
| Comment by Esha Maharishi (Inactive) [ 08/Aug/18 ] | |||||
|
Hmm. Maybe someone from sharding can work with query to figure out what the exact requirement is. I think it's possible that this is not required for the $out project (I am not sure about $lookup), because for $out: 1) mongos targets shards based on the input collection's chunk distribution, so it only needs to send the shardVersion for the input collection 2) the shard targets shards based on the output collection's chunk distribution, so it only needs to send shardVersion for the output collection Therefore, no node needs to send shardVersion for two namespaces. | |||||
| Comment by Kyle Suarez [ 08/Aug/18 ] | |||||