Mongod can return operationTime greater than $clusterTime

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 3.6.7, 4.0.0-rc0
    • Affects Version/s: None
    • Component/s: Sharding
    • None
    • Fully Compatible
    • ALL
    • v3.6
    • Sharding 2018-05-21
    • 15
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      There is a race in the way mongod computes $clusterTime and operationTime. Before returning a response, mongod gets the latest cluster time from the LogicalClock and adds it to the request as $clusterTime (in appendReplyMetadata, called for successful commands here). Then, if a non null $clusterTime was computed, operationTime is computed by asking for the latest opTime on the client for writes, or the opTime of the last applied or committed write, for local and majority reads respectively. There is no synchronization that prevents the last applied or committed opTimes from advancing beyond the previously computed $clusterTime, allowing operationTime to be larger than $clusterTime in the response.

       A straightforward way to fix this could be to just compute operationTime before $clusterTime, because $clusterTime is always allowed to be greater than operationTime.

       

            Assignee:
            Jack Mulrow
            Reporter:
            Jack Mulrow
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: