[SERVER-69219] Always broadcast queries using read concern level "available" to avoid missing unowned documents when filter includes equality on shard key fields Created: 28/Aug/22  Updated: 20/Apr/23  Resolved: 20/Apr/23

Status: Closed
Project: Core Server
Component/s: Distributed Query Planning
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Nicholas Zolnierz
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-76353 Add the ability to force a query to b... Open
Assigned Teams:
Query Optimization
Sprint: QE 2023-02-06, QE 2023-02-20, QE 2023-03-06, QE 2023-03-20, QE 2023-04-03
Participants:

 Description   

Read concern level "available" is used to find unowned documents in a sharded cluster. Allowing these queries be targeted to a subset of the shards is counterproductive because it means unowned documents within a particular chunk range cannot be easily searched for.

One workaround is to do an aggregation pipeline with {$replaceWith: "$$ROOT"} as it'll prevent the shard targeting optimization.



 Comments   
Comment by Nicholas Zolnierz [ 06/Sep/22 ]

Thanks Max and Andy, I've pinged christopher.harris@mongodb.com to see if this change would be useful for TS debugging as it seems like there's no internal reason for us to do this. Will throw it back into the Needs Triage and discuss with the team.

Comment by Max Hirschhorn [ 06/Sep/22 ]

That's a good point the documentation only says "may" and so I was overzealous in calling this a bug. I'd be happy to change the title and issue type so it reads as an improvement request.

Looking through Jira for tickets related to counting unowned documents, I found this comment mentioning a pattern to "Count orphans in a given chunk or shard key range". I wonder how common that is for the Support team and/or Cluster Operator to run. I don't believe the $expr equivalent of the min/max fields in the find command has the same targeting behavior of broadcasting to all shards.

As I had brought up in SERVER-57767, I feel like now that read concern level "available" is no longer the default consistency level for secondary reads, we can repurpose it to truly mean "return the state of the collection as it exists in the Btrees of the individual shards and for stronger read concern levels apply ownership filtering". SERVER-57767 ultimately got closed as Won't Do so I'm also happy for this ticket to get closed too.

Comment by Andy Schwerin [ 05/Sep/22 ]

I don't think this is particularly a priority, but also there's no real use case for today's "available" read concern. Because the targeting ignores shard versioning errors, you don't even know that the query will return all non orphan documents. A chunk of collection that migrates to a shard that did not previously own documents for that collection might not be seen in a query at read concern "available", e.g.

We used to have "available" read concern by default on secondaries, but now that we do not, I do not expect any users are utilizing it.

Comment by Nicholas Zolnierz [ 02/Sep/22 ]

max.hirschhorn@mongodb.com can you clarify the priority for this behavior? Based on the docs, we "may" return orphan docs but it doesn't appear to be a strict requirement for RC available.

We could certainly avoid any shard targetting from mongos with readConcern "available", but wouldn't that hurt performance for users who don't really care about getting all orphans?

Comment by Max Hirschhorn [ 30/Aug/22 ]

Andy noted that we would only want to consider addressing this in MongoDB 5.0+ where the default read concern level is now "local" so secondary reads outside of causally consistent sessions on older versions do not change to broadcasting.

Generated at Thu Feb 08 06:12:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.