Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Querying, Sharding
Labels:
- qexec-team

Assigned Teams:

Query Execution
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

aggregate commands will always establish cursors with batchSize:0. This ensures that the cursors are constructed with the correct shard version and SHARDING_FILTER stage, but does so without performing any query execution work. On the other hand, find cursors are established using the default batchSize (or the user's batchSize if one is explicitly provided). This means that in addition to checking the shard version, the initial find command sent to the shards can do an arbitrary amount of query execution work in order to produce the first batch and thus can take an arbitrarily long time.

If the initial find requests could be cancelled during cursor establishment, this would all be fine. Suppose that a find operation runs on two shards. The shard version check passes on shard one, and begins a 6 hour query execution in order to generate the first batch. The shard version check promptly fails on the second shard, since shard version checking takes place prior to query execution. If mongos were to notice this stale shard version error from shard one, it could respond by cancelling the work on shard two and restarting the entire sharded operation from the cursor establishment phase. However, the find command running on shard one cannot currently be cancelled by mongos, since the mongos does not have either a cursor id or an op id that it could use as a handle to kill the operation. Instead, the operation thread on mongos will remain blocked waiting for shard one to respond. Only after shard one spends 6 hours generating the first batch can the catalog cache be refreshed and the operation retried.

There are a few approaches we could consider for fixing this issue:

Make find operations establish cursors using batchSize:0 in the same way that aggregate operations do. This would make it much more likely that all shards respond within a reasonable time frame during the cursor establishment phase, and thus would help prevent getting stuck in a stale shard version retry loop. However, for small queries this could cause additional round trips between mongos and mongod, and therefore could regress performance.
Enhance the cursor establishment code so that the initial find operations can be canceled when an error is received from another shard. This may be difficult to implement under the current design.

related to

SERVER-43657 Investigate whether there are situations in which we don't need to set batchSize to 0 when opening a cursor for a pipeline

Backlog

Assignee:: [DO NOT USE] Backlog - Query Execution
Reporter:: Steve Choi (Inactive)
Participants:: [DO NOT USE] Backlog - Query Execution, Steve Choi
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Jan 23 2020 11:42:24 PM UTC
Updated:: Oct 04 2023 02:54:34 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates