[SERVER-79245] Unclean shutdown while dropping collection and indexes to resync can make the catalog inconsistent Created: 24/Jul/23 Updated: 04/Oct/23 Resolved: 28/Sep/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Gregory Wlodarek | Assignee: | Jordi Olivares Provencio |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Storage Execution EMEA
|
||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Sprint: | Execution EMEA Team 2023-09-18, Execution EMEA Team 2023-10-02 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
Initial sync will drop all tables in all replicated databases without a timestamp before resyncing. This means that the drop pending ident reaper will immediately drop the table in WT the next time it runs. The table drops in WT are non-transactional and cannot be rolled back. This leads to immediately dropping the table in WT even if the catalog changes are not stable/checkpointed. As a result, during startup recovery, the table no longer exists in WT but continues to exist in the catalog. The server tries to query the index table metadata from WT but WT likely returns ENOENT and we crash. |
| Comments |
| Comment by Jordi Olivares Provencio [ 28/Sep/23 ] |
|
The problem will be fixed in a more generic way by |
| Comment by Jordi Olivares Provencio [ 22/Sep/23 ] |
|
suganthi.mani@mongodb.com I wals also thinking the same as I was working on |
| Comment by Suganthi Mani [ 22/Sep/23 ] |
|
jordi.olivares-provencio@mongodb.com gregory.wlodarek@mongodb.com I filed 1) In the future, engineers might easily overlook the need to enable the flush option. Therefore, I believe we should consider a more general solution, such as having the KV ident dropper perform an unstable checkpoint before dropping the ident with a drop timestamp as 0 (i.e., an untimestamped drop). (???) |
| Comment by Jordi Olivares Provencio [ 05/Sep/23 ] |
|
One solution to this could be to reorder operations such that we first write the catalog changes, flush those changes to disk, THEN perform the drop pending steps. In this manner the state of an ident being present on the catalog but not in WT becomes impossible. |