Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-15960

[Server] Investigate changes in SERVER-70155: Add duration of how long an oplog slot is kept open to mongod "Slow query" log lines

    • 2

      Original Downstream Change Summary

      There is a new field called totalOplogSlotDurationMicros logged in "Slow query" messages and written to the database profiler collection, when the value is non-zero.

      As a bit of background, mongod supports parallel writes, but there is a small serialization critical section where writes are handed commit timestamps with which to commit the storage engine writes. Once write operations have commit timestamps, they can commit in any order: out-of-order writes.

      For example,

      writeA receives Timestamp1,
      writeB receives Timestamp2,
      writeC receives Timestamp3,

      then say writeB commits first at Timestamp2. Replication will not proceed to replicate writeB until after writeA commits. writeA is expected to create an oplog entry with Timestamp1 before replication will copy the oplog to secondaries replica set members. This is the idea of an oplog slot. If writeA were to be delayed somehow in committing – say CPU thread management, or getting stuck in MDB someplace – then that would delay replication. The new totalOplogSlotDurationMicros value represents the time between a write operation getting a commit timestamp and actually committing, and can help surface issues where operations are committing slowly and slowing down replication.

      Description of Linked Ticket

      An oplog hole represents a Timestamp in the oplog an active storage transaction would commit at that is behind the Timestamp in the oplog an already-committed storage transaction wrote their oplog entry at. For example, it is possible for a storage transaction at Timestamp 20 to have committed and for a storage transaction at Timestamp 10 to commit later on in wall-clock time. Oplog readers are prevented from reading beyond the oplog hole to ensure they don't miss any oplog entries which might still commit. Keeping an oplog hole open for any extended period of wall-clock time can lead to stalls in replication.

      Vectored insert is an example of an operation which pre-allocates Timestamps to write user data and the corresponding oplog entries at. In MongoDB 4.4, SERVER-46161 caused the default internal batch size for vectored insert to increase from 64 to 500. This has been seen to lead to higher tail latencies for vectored inserts (SERVER-65054).

      Other operations cause oplog holes too. If expensive work is done (e.g. within an OpObserver) after the oplog slot is allocated and prior to the storage transaction committing, then those operations can stall replication too. Introducing some logging to track the time spent would give more insight into these areas and perhaps even be useful to signal on within our performance testing.

            Assignee:
            kanchana.sekhar@mongodb.com Kanchana Sekhar
            Reporter:
            backlog-server-pm Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:
              39 weeks ago