-
Type:
Improvement
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Replication
-
Fully Compatible
-
v8.2, v8.0, v7.0
-
Repl 2026-02-16
-
200
-
None
-
None
-
None
-
None
-
None
-
None
-
None
We've had some cases where diagnosing the source of lag on the primary was challenging, because there can be many causes. One cause we have seen is that getMore cursors on the primary can't keep up with the rate of writes.
I think it would be tremendously valuable to expose a metric in serverStatus about the getMore lag (time between lastApplied and last returned OpTime). This could be per-replica set node just a maximum of all nodes.
I POCed something that I'll attach, but it doesn't care about multiple nodes (i.e. each update of the metric from overwrites the previous metric, which could have been a different node).
- is fixed by
-
SERVER-119766 Fix data race in oplogFetcherHighestFetchedOptime metric
-
- Closed
-
- related to
-
SERVER-119880 Revert SERVER-116300
-
- Closed
-
-
SERVER-119766 Fix data race in oplogFetcherHighestFetchedOptime metric
-
- Closed
-
-
SERVER-119647 Surface serverStatus metrics about how much time oplog getMores are spending blocked on the storage engine
-
- Open
-