Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33812

First initial sync oplog read batch fetched may be empty; do not treat as an error.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6.6, 3.7.6
    • Component/s: Replication
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.6, v3.4
    • Steps To Reproduce:
      Hide

      diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_oplog_manager.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_oplog_manager.cpp
      index f8c6c70196..13330a526e 100644
      --- a/src/mongo/db/storage/wiredtiger/wiredtiger_oplog_manager.cpp
      +++ b/src/mongo/db/storage/wiredtiger/wiredtiger_oplog_manager.cpp
      @@ -198,6 +198,7 @@ void WiredTigerOplogManager::_oplogJournalThreadLoop(WiredTigerSessionCache* ses
               // Publish the new timestamp value.
               _setOplogReadTimestamp(lk, newTimestamp);
               lk.unlock();
      +        sleepmillis(1500);
       
               // Wake up any await_data cursors and tell them more data might be visible now.
               oplogRecordStore->notifyCappedWaitersIfNeeded();
      

      Cherry-picking f23bcbfa6d08c24b5570b3b29641f96babfc6a34 onto v3.6 also reproduces the bug with the RHEL-62 enterprise builder (the required one on evergreen), though I haven't been able to reproduce locally without inserting extra delays.

      Show
      diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_oplog_manager.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_oplog_manager.cpp index f8c6c70196..13330a526e 100644 --- a/src/mongo/db/storage/wiredtiger/wiredtiger_oplog_manager.cpp +++ b/src/mongo/db/storage/wiredtiger/wiredtiger_oplog_manager.cpp @@ -198,6 +198,7 @@ void WiredTigerOplogManager::_oplogJournalThreadLoop(WiredTigerSessionCache* ses // Publish the new timestamp value. _setOplogReadTimestamp(lk, newTimestamp); lk.unlock(); + sleepmillis(1500); // Wake up any await_data cursors and tell them more data might be visible now. oplogRecordStore->notifyCappedWaitersIfNeeded(); Cherry-picking f23bcbfa6d08c24b5570b3b29641f96babfc6a34 onto v3.6 also reproduces the bug with the RHEL-62 enterprise builder (the required one on evergreen), though I haven't been able to reproduce locally without inserting extra delays.
    • Sprint:
      Repl 2018-04-23
    • Linked BF Score:
      52

      Description

      Currently we depend on an initial sync not receiving an initial batch that is empty. However, this is a possibility, depending on timing in the _oplogJournalThreadLoop. The fix for SERVER-31679 exacerbates this issue, blocking its backport to 3.6. So, currently this issue is blocking the backport of that ticket.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              benety.goh Benety Goh
              Reporter:
              geert.bosch Geert Bosch
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: