Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.5.6
Affects Version/s: None
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Sprint:
Sharding 2017-03-27, Sharding 2017-04-17
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In a multishard environment a mongos that sends read requests to multiple secondaries may run into a performance degradation if the readAfterClusterTime specifies a time that is ahead of the clusterTime of the primary. In this case a secondary which will learn the time at the moment the command arrives will communicate the primary with a heartbeat and then wait for the oplog to replicate fully to match the readConcern. This can be a significant 2 sec heartbeat wait + 10 sec noop writer wait + replication wait i.e. > 10 seconds delay.

To solve it need to advance LogicalTime_LOG the secondary. For that it will need to
communicate to the primary the new clusterTime so the primary can set its LogicalTime_MEM. Once the primary receives the message it will advance its time as described in the “Primary operation” subsection.
wait until the oplog entry with the aforementioned clusterTime is replicated.

1. Add a global function noopWrite . Note: its global cause there is no good place for it. I plan put it into the read_concern.cpp.
(alternatively can be a method on ReplicationCoordinator)

Status ReplicationCoordinatorImpl::noopWrite(OperationContext* opCtx, BSONObj msgObj, StringData note) {
    Lock::GlobalLock lock(opCtx, MODE_IX, 1);
    if (!lock.isLocked()) {
        return {ErrorCodes::LockFailed, "Global lock is not available"};
    }
    opCtx->lockState()->lockMMAPV1Flush();

    // Its a proxy for being a primary passing "local" will cause it to return true on secondary
   
    auto replCoord = repl::ReplicationCoordinator::get(opCtx);
    if (!replCoord->canAcceptWritesForDatabase(opCtx, "admin")) {
        return {ErrorCodes::NotMaster, "Not a primary"};
    }

    MONGO_WRITE_CONFLICT_RETRY_LOOP_BEGIN {
        WriteUnitOfWork uow(opCtx);
        opCtx->getClient()->getServiceContext()->getOpObserver()->onOpMessage(opCtx, msgObj);
        uow.commit();
    }
    MONGO_WRITE_CONFLICT_RETRY_LOOP_END(opCtx, note, repl::rsOplogName);
    return Status::OK();
}

2. to catch up the oplog with cluster time do noop write to the local oplog if its on the primary node or send to the primary if on the secondary

Status makeNoopWriteIfNeeded(OperationContext* opCtx,
                                                  LogicalTime clusterTime)
{
   auto replCoord = repl::ReplicationCoordinator::get(opCtx);
   auto currentTime = replCoord->getMyLastAppliedOpTime();
   if (clusterTime > currentTime) {

        // use Shard::runCommand with "PrimaryOnly: readPreference and idempotent retries
        auto shardingState = ShardingState::get(opCtx);
        invariant(shardingState);
        auto myShard = grid->shardRegistry(shardingState->getShardName());
        auto swRes = myShard->runCommand(opCtx, <"PrimaryOnly">, "admin"  // TODO: add jira to return CanNOtTargetItself if it becomes primary , catch in the status and issue direct noopWrite
                BSON("applyOpLogNote" << 1 << "data" << BSON("append noop write" << 1)), iIdempotent);
       return swRes.status;
     }

3. call makeNoopWriteIfNeeded from waitForReadConcern so it will attempt to catch up the oplog.

4. Rewrite appendOlpogNote

a. use MONGO_INITIALIZER instead of static initialization
b. in the run method call the ReplicationCoordinatorImpl::noopWrite method unless the cmdObj has a clusterTime <= lastAppliedOpTime

    virtual bool run(OperationContext* opCtx,
                     const string& dbname,
                     BSONObj& cmdObj,
                     int,
                     string& errmsg,
                     BSONObjBuilder& result) {
        BSONElement dataElement;
        auto dataStatus = bsonExtractTypedField(cmdObj, "data", Object, &dataElement);
        if (!dataStatus.isOK()) {
            return appendCommandStatus(result, dataStatus);
        }
        auto replCoord = repl::ReplicationCoordinator::get(opCtx);
        if (!replCoord->isReplEnabled()) {
            return appendCommandStatus(result, {ErrorCodes::NoReplicationEnabled, "Must have replication set up to run \"appendOplogNote\""});
        }
        return appendCommandStatus(result, noopWrite(opCtx, dataElement.Obj(), "appendOpLogNote"));
    }
};

is duplicated by

SERVER-28559 appendOplogNote command needs to ensure it's still primary after taking locks

Closed

related to

SERVER-28559 appendOplogNote command needs to ensure it's still primary after taking locks

Closed

Assignee:: Misha Tyulenev (Inactive)
Reporter:: Misha Tyulenev (Inactive)
Participants:: Githook User, Kaloian Manassiev, Misha Tyulenev, Randolph Tan
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jan 20 2017 11:43:24 PM UTC
Updated:: Dec 06 2017 09:12:45 PM UTC
Resolved:: Apr 07 2017 07:07:08 PM UTC
Confidence Status Last Update:: 08/Mar/17 7:27 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates