[SERVER-7418] getlasterror / writeconcern with a timeout should "fail fast" when that is possible Created: 18/Oct/12 Updated: 06/Dec/22 Resolved: 14/Jun/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Dwight Merriman | Assignee: | Backlog - Replication Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Replication
|
| Participants: |
| Description |
|
In theory the primary could know that secondaries are behind to much to possibly finish the current write in the timeout window specified by the caller of the getLastError command, and not wait to time out. this would likely improve stability of client apps as their entire conn pools would likely saturate if cluster is normally super responsive and then suddenly high timeout waits occur. |
| Comments |
| Comment by Scott Hernandez (Inactive) [ 19/Oct/12 ] |
|
Since it is possible to go from 5000s to 0s lag in less than a second this seems hard to support. It is very hard to go the other direction though. This doesn't make the primary any more likely to be able to predict the likelihood of getting to the goal in a certain time period unfortunately. In the case of an unreadable/achievable w (nodes) value it makes sense, and I think we have already done this. We could monitor replications rates and guestimate but that seems more problematic, and more wrong from correctness and usability point of view. |