[DOCS-15960] [Server] Investigate changes in SERVER-70155: Add duration of how long an oplog slot is kept open to mongod "Slow query" log lines Created: 13/Mar/23  Updated: 11/Jan/24  Resolved: 28/Jul/23

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 7.0.0-rc0, 6.0.13, 5.0.24, 4.4.29

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Kanchana Sekhar
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-70155 Add duration of how long an oplog slo... Closed
Participants:
Days since reply: 28 weeks ago
Story Points: 2

 Description   
Original Downstream Change Summary

There is a new field called totalOplogSlotDurationMicros logged in "Slow query" messages and written to the database profiler collection, when the value is non-zero.

As a bit of background, mongod supports parallel writes, but there is a small serialization critical section where writes are handed commit timestamps with which to commit the storage engine writes. Once write operations have commit timestamps, they can commit in any order: out-of-order writes.

For example,

writeA receives Timestamp1,
writeB receives Timestamp2,
writeC receives Timestamp3,

then say writeB commits first at Timestamp2. Replication will not proceed to replicate writeB until after writeA commits. writeA is expected to create an oplog entry with Timestamp1 before replication will copy the oplog to secondaries replica set members. This is the idea of an oplog slot. If writeA were to be delayed somehow in committing – say CPU thread management, or getting stuck in MDB someplace – then that would delay replication. The new totalOplogSlotDurationMicros value represents the time between a write operation getting a commit timestamp and actually committing, and can help surface issues where operations are committing slowly and slowing down replication.

Description of Linked Ticket

An oplog hole represents a Timestamp in the oplog an active storage transaction would commit at that is behind the Timestamp in the oplog an already-committed storage transaction wrote their oplog entry at. For example, it is possible for a storage transaction at Timestamp 20 to have committed and for a storage transaction at Timestamp 10 to commit later on in wall-clock time. Oplog readers are prevented from reading beyond the oplog hole to ensure they don't miss any oplog entries which might still commit. Keeping an oplog hole open for any extended period of wall-clock time can lead to stalls in replication.

Vectored insert is an example of an operation which pre-allocates Timestamps to write user data and the corresponding oplog entries at. In MongoDB 4.4, SERVER-46161 caused the default internal batch size for vectored insert to increase from 64 to 500. This has been seen to lead to higher tail latencies for vectored inserts (SERVER-65054).

Other operations cause oplog holes too. If expensive work is done (e.g. within an OpObserver) after the oplog slot is allocated and prior to the storage transaction committing, then those operations can stall replication too. Introducing some logging to track the time spent would give more insight into these areas and perhaps even be useful to signal on within our performance testing.



 Comments   
Comment by Kanchana Sekhar [ 26/Jul/23 ]

PR: https://github.com/10gen/docs-mongodb-internal/pull/4153

Generated at Thu Feb 08 08:14:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.