Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-78813

Commit point propagation fails indefinitely with exhaust cursors with null lastCommitted optime

    • Fully Compatible
    • ALL
    • v7.0, v6.0, v5.0, v4.4
    • Repl 2023-07-24
    • 70

      If a (syncing) node has a null lastCommitted optime when it issues its exhaust getMore request to its sync source, it will not receive empty oplog branches for commit point propagation forever. This is because we don't take into account or update the last known lastCommitted optime of an exhaust cursor if it's originally null (see here and here).

      Normally, a syncing node would get its sync source's lastCommitted optime from the first oplog find request. But if the sync source node also had a null lastCommitted optime when the syncing node started to sync from it, then I think we could end up in the situation I mentioned above. The elected primary normally relies on the JournalFlusher to trigger the first lastCommitted calculation/update. And this could be delayed because the JournalFlusher is run asynchronously. This could happen after replSetInitiate on 4.4 (see SERVER-58721) or after node restart on all versions.

            lingzhi.deng@mongodb.com Lingzhi Deng
            lingzhi.deng@mongodb.com Lingzhi Deng
            0 Vote for this issue
            9 Start watching this issue