[SERVER-44634] Account for election down time when calculating majority committed lag Created: 14/Nov/19  Updated: 29/Oct/23  Resolved: 12/Jun/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Task Priority: Major - P3
Reporter: Maria van Keulen Assignee: Daniel Gottlieb (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-41187 Majority committed replication lag sp... Closed
Backwards Compatibility: Fully Compatible
Sprint: Execution Team 2019-12-30, Execution Team 2020-03-23, Execution Team 2020-04-06, Execution Team 2020-04-20, Execution Team 2020-06-01, Execution Team 2020-06-15
Participants:

 Description   

According to the findings in SERVER-41187, there is a period of time after an election where the lastCommitted OpTime is still from the previous term, and the lastApplied OpTime on the primary is from the new term. The majority committed "lag" is overstated as a result, because it includes time during which the replica set is not accepting writes.
Calculations of majority committed lag should ideally exclude this down time.

These are the percentile breakdowns of the first majority committed write after a new term across 4.0 and 4.2. The units are in seconds:

| Percentile |            v40 |            v42 |
|------------+----------------+----------------|
|         10 | 0.902999997139 | 0.925999879837 |
|         20 |  1.10899996758 | 0.996999979019 |
|         30 |   1.1819999218 |  1.04299998283 |
|         40 |  1.27499985695 |  1.14300012589 |
|         50 |  1.40999984741 |  1.38400006294 |
|         60 |  1.71499991417 |  1.89600014687 |
|         70 |  2.09000015259 |  1.97300004959 |
|         80 |   2.1930000782 |  2.03999996185 |
|         90 |  2.39699983597 |   2.2009999752 |
|         95 |   2.7619998455 |  2.77999997139 |
|         99 |  7.47925007343 |  3.97099995613 |



 Comments   
Comment by Githook User [ 12/Jun/20 ]

Author:

{'name': 'Daniel Gottlieb', 'email': 'daniel.gottlieb@mongodb.com', 'username': 'dgottlieb'}

Message: SERVER-44634: Account for election downtime when flow control calculates majority committed lag.
Branch: master
https://github.com/mongodb/mongo/commit/01f7c7a2e39c1c555347e23a28a7a6e8357ab5f2

Comment by Connie Chen [ 03/Mar/20 ]

This ticket needs further investigation, moving back to open until the next sprint starts

Comment by Lingzhi Deng [ 14/Nov/19 ]

And judah.schvimer pointed out that this only works if the lastCommitted lag is one term behind. And if there are multiple elections in between, we don't know the aggregate "down time" due to those elections. Though, having the lag across multiple term changes presumably happens a lot less. I guess the question is how precise we want to be v.s. the benefit we would get for the extra complexity.

Comment by Maria van Keulen [ 14/Nov/19 ]

ldeng proposed a solution to this issue where we approximate the "down time" of the replica set by determining the difference between the catch-up point of the new primary and the first OpTime of its new term. Majority committed lag calculations would subtract this difference from the calculation to avoid overstating the lag. Since we are interested in wall clock time lag, we would need to store the wall clock time corresponding to the first OpTime of the term in memory, and also store the wall clock time of the catch-up point.

Generated at Thu Feb 08 05:06:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.