Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45738

Sharded find operations wait for an entire batch to be generated before retrying on stale shard version errors

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Querying, Sharding
    • Labels:
    • Query Execution

      aggregate commands will always establish cursors with batchSize:0. This ensures that the cursors are constructed with the correct shard version and SHARDING_FILTER stage, but does so without performing any query execution work. On the other hand, find cursors are established using the default batchSize (or the user's batchSize if one is explicitly provided). This means that in addition to checking the shard version, the initial find command sent to the shards can do an arbitrary amount of query execution work in order to produce the first batch and thus can take an arbitrarily long time.

      If the initial find requests could be cancelled during cursor establishment, this would all be fine. Suppose that a find operation runs on two shards. The shard version check passes on shard one, and begins a 6 hour query execution in order to generate the first batch. The shard version check promptly fails on the second shard, since shard version checking takes place prior to query execution. If mongos were to notice this stale shard version error from shard one, it could respond by cancelling the work on shard two and restarting the entire sharded operation from the cursor establishment phase. However, the find command running on shard one cannot currently be cancelled by mongos, since the mongos does not have either a cursor id or an op id that it could use as a handle to kill the operation. Instead, the operation thread on mongos will remain blocked waiting for shard one to respond. Only after shard one spends 6 hours generating the first batch can the catalog cache be refreshed and the operation retried.

      There are a few approaches we could consider for fixing this issue:

      • Make find operations establish cursors using batchSize:0 in the same way that aggregate operations do. This would make it much more likely that all shards respond within a reasonable time frame during the cursor establishment phase, and thus would help prevent getting stuck in a stale shard version retry loop. However, for small queries this could cause additional round trips between mongos and mongod, and therefore could regress performance.
      • Enhance the cursor establishment code so that the initial find operations can be canceled when an error is received from another shard. This may be difficult to implement under the current design.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            steve.choi@mongodb.com Steve Choi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: