Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Won't Fix
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Storage
Labels:
None

Assigned Teams:

Storage Execution
Operating System:
ALL
Sprint:
Storage NYC 2019-03-11, Storage NYC 2019-03-25
Linked BF Score:
33
Story Points:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Currently, the old two-phase drop (by rename) executes the second phase (drop of the WT table file) when majority commit point moves past drop optime. If majority commit point is ahead of checkpoint and a crash happens after the second phase drop, on restart, the server will find the metadata of the collection still in the catalog (because it loads last checkpoint) but the actual WT file gets dropped.

On restart, the server can detect that this is from an unclean shutdown by examining the mongod.lock file. Then it can safely remove the metadata of those collections which do not have WT table files.

However, instead of crashing after the second phase drop, opening up backup cursor would cause similar issue which is harder to solve: there is also an inconsistency between WT table files and the catalog. But since we don't copy mongod.lock during backup, then the server does not trigger the code which reconciles the catalog. Then it tries to open a WT file which does not exist and hit this fassert.

To fix this problem, we should delay the second phase until drop optime is checkpointed.

Assignee:: [DO NOT USE] Backlog - Storage Execution Team
Reporter:: Xiangyu Yao (Inactive)
Participants:: [DO NOT USE] Backlog - Storage Execution Team, Githook User, Xiangyu Yao
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Mar 05 2019 12:41:47 AM UTC
Updated:: Dec 06 2022 03:04:17 AM UTC
Resolved:: Apr 02 2020 08:34:15 PM UTC

Details

Description

Attachments

Activity

People

Dates