[SERVER-34606] Test (and possibly fix) behavior around majority commit point and oplog truncation Created: 23/Apr/18 Updated: 29/Oct/23 Resolved: 22/Jun/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.3, 4.1.1 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Ian Whalen (Inactive) | Assignee: | Maria van Keulen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | SWNA, nyc | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Backport Requested: |
v4.0
|
||||||||||||||||||||||||
| Sprint: | Storage NYC 2018-06-04, Storage NYC 2018-06-18, Storage NYC 2018-07-02 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
First step here is that we need to add a js test for the behavior when replication majority commit point stops the oplog being truncated. Depending on what that test turns up, there might be follow-on work to improve that behavior. |
| Comments |
| Comment by Githook User [ 07/Sep/18 ] |
|
Author: {'name': 'Maria van Keulen', 'email': 'maria@mongodb.com', 'username': 'mvankeulen94'}Message: (cherry picked from commit c1803e01a3827072b7dcd962a864c62a426824b6)
(cherry picked from commit b7ff5816f4d9d468b1875013384e7e51184628a0) |
| Comment by Githook User [ 21/Jun/18 ] |
|
Author: {'username': 'mvankeulen94', 'name': 'Maria van Keulen', 'email': 'maria@mongodb.com'}Message: |
| Comment by Alexander Gorrod [ 14/May/18 ] |
I think it should count the number of busy spins.
I would hope so - i.e: we can update the algorithm to not busy spin. |
| Comment by Bruce Lucas (Inactive) [ 11/May/18 ] |
|
alexander.gorrod a counter for failed truncates makes sense. Does it also make sense to have a counter for "stopped oplog reclaim happening if it would remove content that is older than the majority commit point."? Does the counter for failed truncates count the number of busy spins or just the fact that it failed and then we started spinning? May be a moot point of we eliminate this behavior. |
| Comment by Ian Whalen (Inactive) [ 11/May/18 ] |
|
This line of code gets executed way more than once every time we insert that attempts to truncate the oplog, and we expect this execution count to go down drastically if we fix this ticket. |
| Comment by Alexander Gorrod [ 09/May/18 ] |
|
bruce.lucas I believe you can construct the scenario where 1 is happening based on oplog size growing in excess of oplog maxSize and transaction transaction range of timestamps currently pinned being large and growing. Regards 2 - I don't believe there is any tracking. It would make sense to add a server status entry for failed oplog truncate attempts. |
| Comment by Bruce Lucas (Inactive) [ 09/May/18 ] |
|
alexander.gorrod, do we have ftdc metrics that tell us whether your items 1 and 2 are occurring? |
| Comment by Alexander Gorrod [ 09/May/18 ] |
|
For additional context, there was a change made as part of That is a change with user visible consequences - as some internal testing has uncovered. There are two potential behavior differences now: The goal of this ticket is to characterize the user-visible changes, and to add a test to automated testing which tests the new behavior and ensures it is reasonable (yet to be defined). |