Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 3.4.19, 3.6.10, 4.0.5, 4.1.6
Affects Version/s: 3.4.0, 3.6.0, 4.0.0
Component/s: Aggregation Framework
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.0, v3.6, v3.4
Sprint:
Query 2018-11-19
Case:
Linked BF Score:
50
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In the optimized $sample path where we use a pseudo-random cursor (docs), we need to insert a ShardFilterStage to ensure the $sample stage only returns documents which belong to this shard (i.e. it needs to filter out orphans and in-progress migrations). To construct this ShardFilterStage we need to consult the CollectionShardingState to get the metadata for the collection, including the shard key. Accessing this state needs to be done under a lock, so we take a lock to do so:

createRandomCursorExecutor

    auto stage = stdx::make_unique<MultiIteratorStage>(opCtx, ws.get(), collection);
    stage->addIterator(std::move(rsRandCursor));

    {
        AutoGetCollectionForRead autoColl(opCtx, collection->ns());

        // If we're in a sharded environment, we need to filter out documents we don't own.
        if (ShardingState::get(opCtx)->needCollectionMetadata(opCtx, collection->ns().ns())) {
            auto shardFilterStage = stdx::make_unique<ShardFilterStage>(
                opCtx,
                CollectionShardingState::get(opCtx, collection->ns())->getMetadata(opCtx),
                ws.get(),
                stage.release());
            return PlanExecutor::make(opCtx,
                                      std::move(ws),
                                      std::move(shardFilterStage),
                                      collection,
                                      PlanExecutor::YIELD_AUTO);
        }
    }

    return PlanExecutor::make(
        opCtx, std::move(ws), std::move(stage), collection, PlanExecutor::YIELD_AUTO);

This lock is actually unnecessary, because the caller of 'createRandomCursorExecutor' already holds a lock on the collection - we must hold a lock during the entirety of PipelineD::createCursorSource, since we are interacting with a raw Collection*.

Taking this extra lock may seem benign, but it actually has an unintended side-effect: double locking prevents yielding:

PlanYieldPolicy

PlanYieldPolicy::PlanYieldPolicy(PlanExecutor* exec, PlanExecutor::YieldPolicy policy)
    : _policy(exec->getOpCtx()->lockState()->isGlobalLockedRecursively() ? PlanExecutor::NO_YIELD
                                                                         : policy),

Because of this, the PlanExecutor powering the $sample stage will not yield, despite passing the explicit YIELD_AUTO policy (shown above).

This is obviously surprising. I'm not entirely certain about the motivation for this behavior, but I believe this behavior was engineered to prevent surprise yielding of locks from operations which needed to execute some sort of sub-query. For example, a replication command or internal thread may need to find the most recent oplog entry, and would not want that operation to yield the locks on the oplog, despite that being the default behavior.

It looks like this change can be traced back to this commit, which (I assume defensively - just to be sure) added the AutoGetCollection lock as part of an unrelated refactor. That extra lock has been there ever since, meaning this bug affects versions back to 3.4.

related to

SERVER-36871 $sample can loop infinitely on orphaned data

Closed

SERVER-36872 Comment out $sample tests in testshard1.js temporarily

Closed

SERVER-24664 Get rid of calls to ShardingState::getCollectionMetadata

Closed

Assignee:: Bernard Gorman
Reporter:: Charlie Swanson
Participants:: Bernard Gorman, Charlie Swanson, Githook User
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Oct 25 2018 05:28:31 PM UTC
Updated:: Oct 29 2023 10:27:11 PM UTC
Resolved:: Nov 15 2018 03:36:49 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates