[SERVER-70155] Add duration of how long an oplog slot is kept open to mongod "Slow query" log lines Created: 02/Oct/22 Updated: 11/Jan/24 Resolved: 13/Mar/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Write Ops |
| Affects Version/s: | None |
| Fix Version/s: | 7.0.0-rc0, 6.0.13, 5.0.24, 4.4.29 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Max Hirschhorn | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Backport Requested: |
v7.2, v7.0, v6.0, v5.0, v4.4
|
||||||||||||||||||||||||||||||||
| Sprint: | Execution Team 2023-03-06, Execution Team 2023-03-20 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Description |
|
An oplog hole represents a Timestamp in the oplog an active storage transaction would commit at that is behind the Timestamp in the oplog an already-committed storage transaction wrote their oplog entry at. For example, it is possible for a storage transaction at Timestamp 20 to have committed and for a storage transaction at Timestamp 10 to commit later on in wall-clock time. Oplog readers are prevented from reading beyond the oplog hole to ensure they don't miss any oplog entries which might still commit. Keeping an oplog hole open for any extended period of wall-clock time can lead to stalls in replication. Vectored insert is an example of an operation which pre-allocates Timestamps to write user data and the corresponding oplog entries at. In MongoDB 4.4, Other operations cause oplog holes too. If expensive work is done (e.g. within an OpObserver) after the oplog slot is allocated and prior to the storage transaction committing, then those operations can stall replication too. Introducing some logging to track the time spent would give more insight into these areas and perhaps even be useful to signal on within our performance testing. |
| Comments |
| Comment by Githook User [ 11/Jan/24 ] |
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': ''}Message: (cherry picked from commit a7ef4363090042302411a6a28e1fbf4bd18cd3ac) GitOrigin-RevId: 517acbc392a3bd095a889c4ce672184eb43cb522 |
| Comment by Githook User [ 29/Dec/23 ] |
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': ''}Message: (cherry picked from commit a7ef4363090042302411a6a28e1fbf4bd18cd3ac) GitOrigin-RevId: afa8f5824455e023c966cee986a8591ab1eeb049 |
| Comment by Githook User [ 28/Dec/23 ] |
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': ''}Message: (cherry picked from commit a7ef4363090042302411a6a28e1fbf4bd18cd3ac) GitOrigin-RevId: d81c97f81999e99ba248ef096d25a74298df8022 |
| Comment by Githook User [ 13/Mar/23 ] |
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}Message: |
| Comment by Louis Williams [ 20/Oct/22 ] |
|
For both appliedOpTime and latestOpTime, the source of the value is the same. It's the lastApplied value from the node in question. Either of these values would be correct as far as I can tell. On primaries, this represents the highest OpTime ever committed, so I think the lag calculation as lastApplied - allDurable is correct, regardless of how you obtain it. |
| Comment by Louis Williams [ 19/Oct/22 ] |
|
yuan.fang@mongodb.com that looks correct. Can you confirm the full path in serverStatus/FTDC that you're using to get the "appliedOpTime"? I don't see that reported in any of the FTDC data that I have locally from a recent build. Are you by any chance using "oplog latestOptime t"? If so, that would be correct. |
| Comment by Bruce Lucas (Inactive) [ 05/Oct/22 ] |
|
Thanks, that sounds like exactly what I was looking for. Will open ticket to add that computed metric. |
| Comment by Louis Williams [ 05/Oct/22 ] |
|
bruce.lucas@mongodb.com this is the metric that I've used in the past to diagnose issues with oplog hole-induced lag, but you make a good point that this metric could also be affected by journaling. We do have an oplog visibility metric, but the way that it shows up in t2 is not useful: So I think it would make sense to add a metric that calculates the lag between the oplog visbility timestamp and the wall time. |
| Comment by Bruce Lucas (Inactive) [ 04/Oct/22 ] |
|
louis.williams@mongodb.com, just to confirm, are oplog holes the only thing that can cause lastDurable to lag? For example, would a storage bottleneck (or stall) in the journal possibly cause that metric to lag? If so, is there a metric that relates only to oplog holes, or could one be added? |
| Comment by Louis Williams [ 04/Oct/22 ] |
|
The "rs lastDurable wall time lag" should be the serverStatus metric we use that tells us whether there are operations holding oplog holes open for long periods of time. |
| Comment by Bruce Lucas (Inactive) [ 04/Oct/22 ] |
|
I wonder if there's something that could be added to serverStatus (and therefore FTDC) also - or is there already a metric that clearly indicates the component of lag caused by oplog holes? |