[SERVER-60728] Improved MDB crash recovery testing Created: 15/Oct/21 Updated: 06/Nov/23 Resolved: 08/Feb/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.3.0 |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Daniel Gottlieb (Inactive) | Assignee: | Gregory Wlodarek |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | CA-PM, post-mortem | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Sprint: | Execution Team 2022-02-07, Execution Team 2022-02-21 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
MDB currently has powercycle and process termination tests which have historically discovered durability bugs. It's not obvious those are fundamentally insufficient, but we've found other durability bugs when using other techniques. Specifically we can set up a mongodb cluster and:
There's interest in permanently adding this to our testing suites. Attached is an unrefined (apologies) patch that can be used as a starting point in implementing the above. |
| Comments |
| Comment by Githook User [ 08/Feb/22 ] |
|
Author: {'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}Message: |
| Comment by Daniel Gottlieb (Inactive) [ 30/Nov/21 ] |
|
I wouldn't say we understand the gap between the existing tests and what this proposes. What the patch offers did find some data corruption bugs that the existing tests seemed unable to uncover. It was requested we file a ticket to productionize the new testing, but it's not clear to me exactly what form this should take. |
| Comment by Connie Chen [ 29/Nov/21 ] |
|
daniel.gottlieb, would you be able to answer judah.schvimer's question above? |
| Comment by Judah Schvimer [ 08/Nov/21 ] |
|
Do we plan to add this testing to our existing terminate/kill passthroughs? Do we understand the gap between our existing tests and what this proposes? |