[SERVER-70155] Add duration of how long an oplog slot is kept open to mongod "Slow query" log lines Created: 02/Oct/22  Updated: 11/Jan/24  Resolved: 13/Mar/23

Status: Closed
Project: Core Server
Component/s: Replication, Write Ops
Affects Version/s: None
Fix Version/s: 7.0.0-rc0, 6.0.13, 5.0.24, 4.4.29

Type: Improvement Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2022-10-05 at 9.27.40 AM.png     PNG File oplog_visbility_lag_example.png    
Issue Links:
Backports
Documented
is documented by DOCS-15960 [Server] Investigate changes in SERVE... Closed
Related
is related to SERVER-84449 High load can cause replication lag l... Open
is related to SERVER-84467 Add duration of how long an oplog slo... Open
is related to SERVER-74604 Report the same metrics in both "Slow... Backlog
is related to SERVER-65054 Avoid slow insert batches blocking re... Open
Assigned Teams:
Storage Execution
Backwards Compatibility: Fully Compatible
Backport Requested:
v7.2, v7.0, v6.0, v5.0, v4.4
Sprint: Execution Team 2023-03-06, Execution Team 2023-03-20
Participants:

 Description   

An oplog hole represents a Timestamp in the oplog an active storage transaction would commit at that is behind the Timestamp in the oplog an already-committed storage transaction wrote their oplog entry at. For example, it is possible for a storage transaction at Timestamp 20 to have committed and for a storage transaction at Timestamp 10 to commit later on in wall-clock time. Oplog readers are prevented from reading beyond the oplog hole to ensure they don't miss any oplog entries which might still commit. Keeping an oplog hole open for any extended period of wall-clock time can lead to stalls in replication.

Vectored insert is an example of an operation which pre-allocates Timestamps to write user data and the corresponding oplog entries at. In MongoDB 4.4, SERVER-46161 caused the default internal batch size for vectored insert to increase from 64 to 500. This has been seen to lead to higher tail latencies for vectored inserts (SERVER-65054).

Other operations cause oplog holes too. If expensive work is done (e.g. within an OpObserver) after the oplog slot is allocated and prior to the storage transaction committing, then those operations can stall replication too. Introducing some logging to track the time spent would give more insight into these areas and perhaps even be useful to signal on within our performance testing.



 Comments   
Comment by Githook User [ 11/Jan/24 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': ''}

Message: SERVER-70155 Add totalOplogSlotDurationMicros to 'Slow query' log msgs and the profiler data, to track how long a write operation holds an oplog slot open before commit

(cherry picked from commit a7ef4363090042302411a6a28e1fbf4bd18cd3ac)

GitOrigin-RevId: 517acbc392a3bd095a889c4ce672184eb43cb522
Branch: v4.4
https://github.com/mongodb/mongo/commit/bb8e21b5f7bb2649ed31d57b6dbc4468541ad80e

Comment by Githook User [ 29/Dec/23 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': ''}

Message: SERVER-70155 Add totalOplogSlotDurationMicros to 'Slow query' log msgs and the profiler data, to track how long a write operation holds an oplog slot open before commit

(cherry picked from commit a7ef4363090042302411a6a28e1fbf4bd18cd3ac)

GitOrigin-RevId: afa8f5824455e023c966cee986a8591ab1eeb049
Branch: v5.0
https://github.com/mongodb/mongo/commit/9965e5c97cfaeb5d74dee9841eb2effc535d95b7

Comment by Githook User [ 28/Dec/23 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': ''}

Message: SERVER-70155 Add totalOplogSlotDurationMicros to 'Slow query' log msgs and the profiler data, to track how long a write operation holds an oplog slot open before commit

(cherry picked from commit a7ef4363090042302411a6a28e1fbf4bd18cd3ac)

GitOrigin-RevId: d81c97f81999e99ba248ef096d25a74298df8022
Branch: v6.0
https://github.com/mongodb/mongo/commit/6077a9ddf0d8cac0a8e10503c8caf257a15efa9c

Comment by Githook User [ 13/Mar/23 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-70155 Add totalOplogSlotDurationMicros to 'Slow query' log msgs and the profiler data, to track how long a write operation holds an oplog slot open before commit
Branch: master
https://github.com/mongodb/mongo/commit/a7ef4363090042302411a6a28e1fbf4bd18cd3ac

Comment by Louis Williams [ 20/Oct/22 ]

For both appliedOpTime and latestOpTime, the source of the value is the same. It's the lastApplied value from the node in question. Either of these values would be correct as far as I can tell.

On primaries, this represents the highest OpTime ever committed, so I think the lag calculation as lastApplied - allDurable is correct, regardless of how you obtain it.

Comment by Louis Williams [ 19/Oct/22 ]

yuan.fang@mongodb.com that looks correct. Can you confirm the full path in serverStatus/FTDC that you're using to get the "appliedOpTime"? I don't see that reported in any of the FTDC data that I have locally from a recent build. Are you by any chance using "oplog latestOptime t"? If so, that would be correct.

Comment by Bruce Lucas (Inactive) [ 05/Oct/22 ]

Thanks, that sounds like exactly what I was looking for. Will open ticket to add that computed metric.

Comment by Louis Williams [ 05/Oct/22 ]

bruce.lucas@mongodb.com this is the metric that I've used in the past to diagnose issues with oplog hole-induced lag, but you make a good point that this metric could also be affected by journaling.

We do have an oplog visibility metric, but the way that it shows up in t2 is not useful:

So I think it would make sense to add a metric that calculates the lag between the oplog visbility timestamp and the wall time.

Comment by Bruce Lucas (Inactive) [ 04/Oct/22 ]

louis.williams@mongodb.com, just to confirm, are oplog holes the only thing that can cause lastDurable to lag? For example, would a storage bottleneck (or stall) in the journal possibly cause that metric to lag? If so, is there a metric that relates only to oplog holes, or could one be added?

Comment by Louis Williams [ 04/Oct/22 ]

The "rs lastDurable wall time lag" should be the serverStatus metric we use that tells us whether there are operations holding oplog holes open for long periods of time.

Comment by Bruce Lucas (Inactive) [ 04/Oct/22 ]

I wonder if there's something that could be added to serverStatus (and therefore FTDC) also - or is there already a metric that clearly indicates the component of lag caused by oplog holes?

Generated at Thu Feb 08 06:15:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.