[SERVER-7418] getlasterror / writeconcern with a timeout should "fail fast" when that is possible Created: 18/Oct/12  Updated: 06/Dec/22  Resolved: 14/Jun/18

Status: Closed
Project: Core Server
Component/s: Performance, Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Dwight Merriman Assignee: Backlog - Replication Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Replication
Participants:

 Description   

In theory the primary could know that secondaries are behind to much to possibly finish the current write in the timeout window specified by the caller of the getLastError command, and not wait to time out. this would likely improve stability of client apps as their entire conn pools would likely saturate if cluster is normally super responsive and then suddenly high timeout waits occur.



 Comments   
Comment by Scott Hernandez (Inactive) [ 19/Oct/12 ]

Since it is possible to go from 5000s to 0s lag in less than a second this seems hard to support. It is very hard to go the other direction though. This doesn't make the primary any more likely to be able to predict the likelihood of getting to the goal in a certain time period unfortunately. In the case of an unreadable/achievable w (nodes) value it makes sense, and I think we have already done this.

We could monitor replications rates and guestimate but that seems more problematic, and more wrong from correctness and usability point of view.

Generated at Thu Feb 08 03:14:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.