-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Aggregation Framework, Sharding
-
None
-
Fully Compatible
-
ALL
-
Query 2018-11-19, Query 2018-12-03
As part of SERVER-36813 we added a constraint that the epoch of $out's targeted collection must be agreed to by all participating processes, and must not change during the course of the aggregation. This will unintentionally prevent an $out from executing on a shard which does not know of (have any chunks for) the targeted collection, because there's nothing to ensure that shard's routing table is up to date, and no good logic to refresh that shard's routing table (especially if we want to avoid the deadlock described in SERVER-37398).
For this ticket, we'd like to add logic to do the following:
1) Detect at parse time if the shard executing the $out is more stale than the mongos. This has to happen without refreshing the CatalogCache because doing so needs to take a lock which could induce the deadlock described in SERVER-37398.
2) If we detect the shard is stale, throw a new type of exception. Catch this exception in the shard's service entry point, and (now that we've released all the locks) cause the mongod to refresh it's catalog for the $out's targeted collection. After starting this process, we will return an error to the mongos to signal the mongos should retry the entire command.