[SERVER-77916] Provide an API that indicates an approximate number of orphans on a shard Created: 08/Jun/23  Updated: 13/Jul/23  Resolved: 13/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Ben Shteinfeld Assignee: Matt Panton
Resolution: Won't Fix Votes: 0
Labels: sharding-product-sync
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-77914 Provide an API that indicates whether... Backlog
Assigned Teams:
Sharding EMEA
Sprint: Sharding EMEA 2023-07-10, Sharding EMEA 2023-07-24
Participants:

 Description   

As part of the scope for PM-2650, the query team is requesting the shard team for an API that indicates an approximate fraction of documents that are orphans (scheduled to be deleted by the range deleter, but in the current snapshot will be returned by a collection/index scan). This information will be used during query planning to estimate the cost of different access paths.



 Comments   
Comment by Ben Shteinfeld [ 13/Jul/23 ]

Discussed offline with matt.panton@mongodb.com jordi.serra-torrens@mongodb.com xiaochen.wu@mongodb.com david.storch@mongodb.com and richard.hausman@mongodb.com. Because we expect the number of orphans to be very low in the majority of cases, and because we chose to represent shard filtering in the physical plan space (which doesn't have cardinality estimation), we decided that cardinality estimation of shard filtering is not worth the effort and provides very little value. We will just assume that the selectivity of shard filtering is 100%. As a result, this request is now obsolete and I'm closing it as "Won't fix".

If in the future, we find that our assumption around the costing of shard filtering is causing suboptimal plans to be chosen, we may revisit this assumption then.

Comment by Ben Shteinfeld [ 12/Jun/23 ]

I think wrapping it in to one API with the semantics of 0% means definite is ok if that is preferable for you. Your question about yield/restore made me think more about what the semantics of this API will actually be. How long will the information returned by the API remain true? If it says there are no orphans, will this be true for the duration of: the operation context, as long as the collection is locked, no guarantee?

My understanding of the current implementation in the the classic engine is that yield/restore will perform a shard version check when restoring and re-acquiring the collection pointer, code pointer.  This means that if orphans appeared, then the shard version check should fail and throw, and we'll retry the query. If this understanding if correct, how would the situation which you described above occur?

Comment by Kaloian Manassiev [ 08/Jun/23 ]

Can this be the same API as that of SERVER-77914 just that the zero is "definite" but more than zero is not.

Also ben.shteinfeld@mongodb.com, do you require some kind of notification post-yield/restore to tell you that orphans have now appeared? Basically related to the thread that I had put in the scope for what happens if you plan for no orphans and then they appear.

Generated at Thu Feb 08 06:36:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.