[SERVER-78356] Encode flag indicating whether all data is present on a single shard into the SBE plan cache key Created: 22/Jun/23 Updated: 02/Aug/23 Resolved: 02/Aug/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | Backlog - Query Optimization |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Optimization
|
| Participants: |
| Description |
|
For queries against unsharded collections, the query system currently generates plans which do not perform orphan filtering. Such plans can then get cached in the SBE plan cache. If the collection later becomes sharded, then cached plans which do not perform shard filtering are no longer valid. In order to prevent invalid cached plans from being reused when a collection becomes sharded, we currently encode the collection's sharding epoch into the SBE plan cache key. This depends on the collection's sharding epoch changing when a collection transitions from unsharded to sharded. The problem is that the sharding team is working on eliminating the concept of unsharded collections and may alter the semantics of the sharding epoch so that it no longer gets bumped when a collection is first sharded. In order to prepare for this change, we should explicitly encode whether or not the collection is sharded into the SBE plan cache key. That way the sharding team will not unwittingly break the SBE plan cache with a subtle change to unrelated logic around the collection sharding epoch. |
| Comments |
| Comment by Ben Shteinfeld [ 02/Aug/23 ] |
|
After discussion with david.storch@mongodb.com, pierlauro.sciarelli@mongodb.com, sergi.mateo-bellido@mongodb.com and marcos.grillo@mongodb.com we are closing this ticket as won't do. The current SBE plan cache key includes the sharding epoch and collection timestamp, which is sufficient to correctly invalidate the cache when a collection becomes sharded or is dropped/recreated. This will prevent us from forgetting to apply a shard filter in a plan when it is necessary. This ticket has an interesting connection to SERVER-77914. If this ticket was done, it would enable an optimization to avoid performing shard filtering for a sharded, splittable collection which happens to have all of its data live on a single shard. However, if we did this, we would need to take care to put this bit of information into the plan cache key to avoid using a cached plan that omits shard filtering on a collection which had a chunk migration and now requires a shard filtering stage. |