[SERVER-78356] Encode flag indicating whether all data is present on a single shard into the SBE plan cache key Created: 22/Jun/23  Updated: 02/Aug/23  Resolved: 02/Aug/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: David Storch Assignee: Backlog - Query Optimization
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

For queries against unsharded collections, the query system currently generates plans which do not perform orphan filtering. Such plans can then get cached in the SBE plan cache. If the collection later becomes sharded, then cached plans which do not perform shard filtering are no longer valid. In order to prevent invalid cached plans from being reused when a collection becomes sharded, we currently encode the collection's sharding epoch into the SBE plan cache key.

This depends on the collection's sharding epoch changing when a collection transitions from unsharded to sharded. The problem is that the sharding team is working on eliminating the concept of unsharded collections and may alter the semantics of the sharding epoch so that it no longer gets bumped when a collection is first sharded. In order to prepare for this change, we should explicitly encode whether or not the collection is sharded into the SBE plan cache key. That way the sharding team will not unwittingly break the SBE plan cache with a subtle change to unrelated logic around the collection sharding epoch.



 Comments   
Comment by Ben Shteinfeld [ 02/Aug/23 ]

After discussion with david.storch@mongodb.com, pierlauro.sciarelli@mongodb.com, sergi.mateo-bellido@mongodb.com and marcos.grillo@mongodb.com we are closing this ticket as won't do. The current SBE plan cache key includes the sharding epoch and collection timestamp, which is sufficient to correctly invalidate the cache when a collection becomes sharded or is dropped/recreated. This will prevent us from forgetting to apply a shard filter in a plan when it is necessary.

This ticket has an interesting connection to SERVER-77914. If this ticket was done, it would enable an optimization to avoid performing shard filtering for a sharded, splittable collection which happens to have all of its data live on a single shard. However, if we did this, we would need to take care to put this bit of information into the plan cache key to avoid using a cached plan that omits shard filtering on a collection which had a chunk migration and now requires a shard filtering stage.

Generated at Thu Feb 08 06:38:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.