[SERVER-61957] Incorrect check for whether a collection is sharded in cluster_aggregate.cpp Created: 08/Dec/21  Updated: 06/Dec/22

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Esha Maharishi (Inactive) Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Query Execution
Sprint: QE 2021-12-27, QE 2022-01-10, QE 2022-04-04, QE 2022-02-07, QE 2022-02-21, QE 2022-03-07, QE 2022-03-21, QE 2022-01-24, QE 2022-04-18, QE 2022-05-02
Participants:

 Description   

Currently cluster_aggregate.cpp asserts that for each involved namesapce, either all of the pipeline stages work with sharded collections or the namespace is not sharded (via this function that is passed in).

However, it is not correct for a router to use its routing table cache to check if a collection is sharded, because the cache can be stale: the collection could have been dropped and recreated as unsharded.

Instead, the router could assume the collection is unsharded and the data node could error if the collection is actually sharded.



 Comments   
Comment by Esha Maharishi (Inactive) [ 13/Dec/21 ]

steve.la, an example user setup is:

  • User creates and shards a collection
  • User drops the collection and recreates it as unsharded
  • User runs a query that uses the collection in a stage not supported on sharded collections and gets back an error.

It does not bring down any node in the cluster, just fails the query.

It seems like this code is at least 3 years old.

There is a workaround (the user can call flushRouterConfig against the mongos or restart the mongos, after which mongos will load a fresh list of the sharded collections). I also hope all stages will work with sharded collections within the next one or two years.

I mainly wanted to raise this because I've seen similar bugs a few times (SERVER-61333, SERVER-42788). I see SERVER-45186 is a generic ticket about this. It sounds fine to close this ticket, SERVER-61333, and SERVER-42788 as dupes of SERVER-45186 and just prioritize SERVER-45186.

Generated at Thu Feb 08 05:53:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.