Prevent multiple noopWrites to opLog

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Done
    • Priority: Major - P3
    • 3.5.10
    • Affects Version/s: None
    • Component/s: Sharding
    • None
    • Fully Compatible
    • Sharding 2017-06-19, Sharding 2017-07-10
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Filed as a follow up of SERVER-27772.

      It may be possible that multiple threads are getting the readConcern: {afterClusterTime: <clusterTime>} while the <clusterTime> > lastAllpliedOpTime. In this case the multiple noop writes should be prevented.

      Implementation plan:

      As the makeNoopWriteIfNeeded can be scheduled from multiple threads it should contain the context that can be shared in order to make threads wait on the started operation. Hence makeNoopWriteIfNeeded needs to be a method of a newly added CallbackSerializer class, which contains the runtime state.

      The instance of the class will be placed on the ServiceContext.

      class CallbackSerializer {
      public:
         Status runCallbackSerially(OperationContext* opCtx, stdx::function<OperationContext* opCtx> callback);
      private:
         stdx::shared_ptr<Notification<Status>>_writeRequest;
      };
      
      Status CallbackSerializer::runCallbackSerially(OperationContext* opCtx, stdx::function<OperationContext* opCtx> callback) {
          stdx::lock_guard<stdx::mutex> lk(_mutex);
      
          auto writeRequest = _writeRequest;
          if (!writeRequest) {
              writeRequest = (_writeRequest = std::make_shared<Notification<Status>>());
              lk.unlock();
              auto status = callback(opCtx);
              writeRequest->set(status);
              writeRequest = nullptr;
          }
          else {
              lk.unlock();
              try {
                  return writeRequest->get(opCtx);
              } catch  (const DBException& ex) {
                  return ex.toStatus();
              }
          }
      }
      

      the scheduling will be done from the existing path in read_concern:

             const auto writeCallback = [this, myShard](OperationContext* opCtx) {
                  auto swRes = myShard.getValue()->runCommand(
                      opCtx,
                      ReadPreferenceSetting(ReadPreference::PrimaryOnly),
                      "admin",
                      BSON("appendOplogNote" << 1 << "maxClusterTime" << clusterTime.asTimestamp() << "data"
                                             << BSON("noop write for afterClusterTime read concern" << 1)),
                      Shard::RetryPolicy::kIdempotent);
                  return swRes.getStatus();
              }
              auto oplogNoopWriter = CallbackSerializer::get(opCtx);
              try {
                  auto status = oplogNoopWriter->serializeNoopWrite(opCtx, writeCallback);
              } catch (const DBException& ex) {
                  return ex.toStatus();
              }
      
      

            Assignee:
            Misha Tyulenev (Inactive)
            Reporter:
            Misha Tyulenev (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: