[SERVER-36813] Test the defaulting of uniqueKey when the catalog cache is stale Created: 22/Aug/18 Updated: 29/Oct/23 Resolved: 01/Nov/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.5 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Charlie Swanson | Assignee: | Charlie Swanson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Query 2018-09-10, Query 2018-09-24, Query 2018-10-08, Query 2018-10-22, Query 2018-11-05 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 45 | ||||||||
| Description |
|
We added some testing of what happens when an $out is executing with a stale cache, but we'd like to target the functionality introduced in This will include the interesting test case of seeing what unique key is used if a mongos has a stale version of the cache which indicates that a collection is sharded but it actually is not, or if it is sharded but now with a different key. |
| Comments |
| Comment by Githook User [ 01/Nov/18 ] |
|
Author: {'name': 'Charlie Swanson', 'email': 'charlie.swanson@mongodb.com', 'username': 'cswanson310'}Message: Before doing so, refresh the catalog cache to make sure the mongos |
| Comment by Charlie Swanson [ 04/Sep/18 ] |
|
kaloian.manassiev failing the cluster writes will not fully solve the problem. At that point we have already generated a whole batch of writes with the wrong shard key, which may have led to errors like those described above relating to |
| Comment by Kaloian Manassiev [ 04/Sep/18 ] |
|
Failing cluster writes on epoch mismatch will solve the problem with the shard key changing from underneath a long executing $out aggregation, so the listIndexes call is only an optimization to avoid restarting a long-running aggregation on the first stale version, right? Otherwise, what kind of bugs is there a possibility for? |
| Comment by Nicholas Zolnierz [ 04/Sep/18 ] |
|
It's unfortunate, but sounds like a good plan to me. Per in-person discussion, we may take a perf hit in latency but its unlikely to affect users since the $out will likely be bottlenecked in the writes. |
| Comment by Charlie Swanson [ 04/Sep/18 ] |
|
Ok I've brainstormed a lot and I'll post a brief update here. There is definitely a large potential for bugs here, because we are not using any sort of shard versioning protocol on the $out namespace, only the source namespace of the aggregation. For example, if your mongos is very stale - either because no one has asked it about the $out collection in a while, or because it was just restarted, the mongos which parses the $out could end up filling in the wrong uniqueKey for the $out stage. Or, imagine the aggregation takes a couple hours to run, if the collection becomes sharded or is dropped and recreated during that period, the uniqueKey could be stale. The user-impact of using the wrong default uniqueKey is hard to quantify. Because the default uniqueKey will always include the _id, and because you can only end up with duplicate _ids if you managed to insert them on different shards, there will likely still be at most one document matching the uniqueKey for each document output from the $out stage - so pretty low likelihood in a difference of behavior there. However, we will enforce certain constraints, as described in While investigating this issue, I realized that the targeter used inside the ClusterWriter for all writes on mongos will keep retrying the writes, even if the collection has been dropped and re-sharded - or has become sharded recently. This is similar in the way that it probably will usually do the right thing, but doesn't feel obviously correct. For example, imagine you are halfway through applying a batch of writes and the collection is dropped and re-created. It's not obvious if this batch should fail - I don't think it does currently. As I discussed with kaloian.manassiev, we could probably tighten this up a bit by tracking within the ChunkManagerTargeter whether or not the collection's UUID has changed during the course of targeting. If it has, it seems pretty reasonable to abort the operation. At that point the shard key may have changed, and it's hard to say if the user wanted the operation to keep going. After speaking more to asya, it seems like the "concurrent change in shard key" problem isn't worth spending much effort solving. Instead, we should focus on the scenario where the mongos is stale. To achieve this, I propose we always issue a listIndexes command to the primary shard for the $out's target database, and that this command should be versioned (attach a shard version to the command, which I checked does result in a StaleConfig exception if the mongos is stale). On a stale shard version error, we should retry the entire aggregate command, from the top. One of nicholas.zolnierz,s patches for How's that plan sound to everyone? cc kyle.suarez and david.storch |