Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-37750

Optimized $sample stage does not yield

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.0, v3.6, v3.4
    • Sprint:
      Query 2018-11-19
    • Case:
    • Linked BF Score:
      50

      Description

      In the optimized $sample path where we use a pseudo-random cursor (docs), we need to insert a ShardFilterStage to ensure the $sample stage only returns documents which belong to this shard (i.e. it needs to filter out orphans and in-progress migrations). To construct this ShardFilterStage we need to consult the CollectionShardingState to get the metadata for the collection, including the shard key. Accessing this state needs to be done under a lock, so we take a lock to do so:

      createRandomCursorExecutor

          auto stage = stdx::make_unique<MultiIteratorStage>(opCtx, ws.get(), collection);
          stage->addIterator(std::move(rsRandCursor));
       
          {
              AutoGetCollectionForRead autoColl(opCtx, collection->ns());
       
              // If we're in a sharded environment, we need to filter out documents we don't own.
              if (ShardingState::get(opCtx)->needCollectionMetadata(opCtx, collection->ns().ns())) {
                  auto shardFilterStage = stdx::make_unique<ShardFilterStage>(
                      opCtx,
                      CollectionShardingState::get(opCtx, collection->ns())->getMetadata(opCtx),
                      ws.get(),
                      stage.release());
                  return PlanExecutor::make(opCtx,
                                            std::move(ws),
                                            std::move(shardFilterStage),
                                            collection,
                                            PlanExecutor::YIELD_AUTO);
              }
          }
       
          return PlanExecutor::make(
              opCtx, std::move(ws), std::move(stage), collection, PlanExecutor::YIELD_AUTO);
      

      This lock is actually unnecessary, because the caller of 'createRandomCursorExecutor' already holds a lock on the collection - we must hold a lock during the entirety of PipelineD::createCursorSource, since we are interacting with a raw Collection*.

      Taking this extra lock may seem benign, but it actually has an unintended side-effect: double locking prevents yielding:

      PlanYieldPolicy

      PlanYieldPolicy::PlanYieldPolicy(PlanExecutor* exec, PlanExecutor::YieldPolicy policy)
          : _policy(exec->getOpCtx()->lockState()->isGlobalLockedRecursively() ? PlanExecutor::NO_YIELD
                                                                               : policy),
      

      Because of this, the PlanExecutor powering the $sample stage will not yield, despite passing the explicit YIELD_AUTO policy (shown above).

      This is obviously surprising. I'm not entirely certain about the motivation for this behavior, but I believe this behavior was engineered to prevent surprise yielding of locks from operations which needed to execute some sort of sub-query. For example, a replication command or internal thread may need to find the most recent oplog entry, and would not want that operation to yield the locks on the oplog, despite that being the default behavior.

      It looks like this change can be traced back to this commit, which (I assume defensively - just to be sure) added the AutoGetCollection lock as part of an unrelated refactor. That extra lock has been there ever since, meaning this bug affects versions back to 3.4.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: