awaitReplication can hang when the optime to wait for does not match the minSnapshot.

XMLWordPrintableJSON

    • Fully Compatible
    • ALL
    • Repl 2018-01-01, Repl 2018-01-15, Repl 2018-01-29
    • 0
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      ReplicationCoordinatorImpl::_awaitReplication_inlock accepts waiting for an opTime and a minSnapshot. This method will register itself onto a waiter list for a condition notification and successfully return when _doneWaitingForReplication_inlock returns true.

      In order for the predicate to return true, a valid snapshot must exist at the minSnapshot time.

      However, the condition variable is notified when _doneWaitingForReplication_inlock succeeds with a trivially true minSnapshot value. Also note that notifying a waiter also removes it from the list waiters that are notified when optimes advance.

      In this case, the predicate for _awaitReplication_inlock is stronger than to be notified, and because notification happens at most once, a client can hang waiting for a followup notification will never come.

              Assignee:
              Benety Goh
              Reporter:
              Daniel Gottlieb (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: