[SERVER-28639] Prevent multiple noopWrites to opLog Created: 05/Apr/17  Updated: 08/Jan/24  Resolved: 21/Jun/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.5.10

Type: Improvement Priority: Major - P3
Reporter: Misha Tyulenev Assignee: Misha Tyulenev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Sharding 2017-06-19, Sharding 2017-07-10
Participants:

 Description   

Filed as a follow up of SERVER-27772.

It may be possible that multiple threads are getting the readConcern: {afterClusterTime: <clusterTime>} while the <clusterTime> > lastAllpliedOpTime. In this case the multiple noop writes should be prevented.

Implementation plan:

As the makeNoopWriteIfNeeded can be scheduled from multiple threads it should contain the context that can be shared in order to make threads wait on the started operation. Hence makeNoopWriteIfNeeded needs to be a method of a newly added CallbackSerializer class, which contains the runtime state.

The instance of the class will be placed on the ServiceContext.

class CallbackSerializer {
public:
   Status runCallbackSerially(OperationContext* opCtx, stdx::function<OperationContext* opCtx> callback);
private:
   stdx::shared_ptr<Notification<Status>>_writeRequest;
};
 
Status CallbackSerializer::runCallbackSerially(OperationContext* opCtx, stdx::function<OperationContext* opCtx> callback) {
    stdx::lock_guard<stdx::mutex> lk(_mutex);
 
    auto writeRequest = _writeRequest;
    if (!writeRequest) {
        writeRequest = (_writeRequest = std::make_shared<Notification<Status>>());
        lk.unlock();
        auto status = callback(opCtx);
        writeRequest->set(status);
        writeRequest = nullptr;
    }
    else {
        lk.unlock();
        try {
            return writeRequest->get(opCtx);
        } catch  (const DBException& ex) {
            return ex.toStatus();
        }
    }
}

the scheduling will be done from the existing path in read_concern:

       const auto writeCallback = [this, myShard](OperationContext* opCtx) {
            auto swRes = myShard.getValue()->runCommand(
                opCtx,
                ReadPreferenceSetting(ReadPreference::PrimaryOnly),
                "admin",
                BSON("appendOplogNote" << 1 << "maxClusterTime" << clusterTime.asTimestamp() << "data"
                                       << BSON("noop write for afterClusterTime read concern" << 1)),
                Shard::RetryPolicy::kIdempotent);
            return swRes.getStatus();
        }
        auto oplogNoopWriter = CallbackSerializer::get(opCtx);
        try {
            auto status = oplogNoopWriter->serializeNoopWrite(opCtx, writeCallback);
        } catch (const DBException& ex) {
            return ex.toStatus();
        }



 Comments   
Comment by Githook User [ 21/Jun/17 ]

Author:

{u'username': u'mikety', u'name': u'Misha Tyulenev', u'email': u'misha@mongodb.com'}

Message: SERVER-28639 Prevent multiple noopWrites to opLog
Branch: master
https://github.com/mongodb/mongo/commit/1f60552a59ad9c6bf4087721335a8dd7981d2b98

Comment by Mira Carey [ 15/Jun/17 ]

In principle a generic type would interesting. I suspect that std::async with a deferred policy, or std::call_once (with a once_flag) could provide a good backend, but we can tackle that when we get there.

Comment by Misha Tyulenev [ 15/Jun/17 ]

Thanks kaloian.manassiev
mira.carey@mongodb.com please ack the "callback serializer" class. I think to make it generic it should also be able to schedule different callbacks.
So as a first solution Ill try to add it to anonymous namespace but add a jira for converting it to the generic prod grade class.

Comment by Kaloian Manassiev [ 15/Jun/17 ]

LGTM, except that OplogNoopWriter is not really an oplog writer anymore, but just a "callback serializer". Perhaps put it in mongo/util/concurrency.

Also if you have this class, you can also make sure that secondaries do not send more than one refresh at a time

Comment by Misha Tyulenev [ 15/Jun/17 ]

Updated the description per feedback

Comment by Kaloian Manassiev [ 14/Jun/17 ]

OplogNoopWriter will go to ServiceContext

My question was where would you hook the call into the OplogNoopWriter (not where will it be stored). The reason I am asking is because I suspect it would have to be in AppendOplogNoteCmd, which the writer is using and that would make the command serialized as well. I think this is probably alright, but we should check with repl if that is the case.

the second case while might happen is limited to the #of secondaries

I think the problem in that case might be the number of concurrent connections that they would open against the primary to hit it with refresh requests. Probably not that big of a problem since the writes on the primary will be serialized.

Comment by Misha Tyulenev [ 13/Jun/17 ]

1. OplogNoopWriter will go to ServiceContext
2. the second case while might happen is limited to the #of secondaries

Comment by Kaloian Manassiev [ 13/Jun/17 ]

I have two questions:

  • Where are you going to hook the OplogNoopWriter? I.e., would it be in the 'oplog note' command which is publicly visible or in something specific to causal consistency?
  • This solves the problem of a single node receiving multiple noop write requests, but not the same node hitting the primary with multiple requests. Does the second case need to be handled as well so the primary is not slammed with a burst of requests or that's not considered to be a problem?

Apart from that the proposed OplogNoopWriter implementation looks good to me, except that one of the waitFor calls seems to be happening under a mutex.

Comment by Misha Tyulenev [ 13/Jun/17 ]

kaloian.manassiev please lmk the feedback

Generated at Thu Feb 08 04:18:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.