[SERVER-38606] Stop allowing NamespaceNotFound errors during startup replication recovery. The oplog replay logic will abort on NamespaceNotFound errors while applying CRUD operations. Created: 13/Dec/18  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Benety Goh Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-39235 Support SE-based 2-phase drop for SE'... Closed
Related
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Sprint: Storage NYC 2019-01-28, Execution Team 2019-09-09, Execution Team 2019-09-23
Participants:

 Description   

With the old 4.0-style two phase drop, if the server crashes after the actual WT table drop and before a newer checkpoint has been generated (see below), after restart we will have a state where collection is in mdb_catalog but not backed by any WT tables.
-------- Checkpoint ---- Rename ------------- Actual Drop -------- Server Crash

Therefore we chose to allow NamespaceNotFound errors in replication recovery.

After we've done the new 4.2-style two phase drop, this error should never happen during replication recovery because the actual WT table drop will always happen after a stable checkpoint which includes the mdb_catalog changes.



 Comments   
Comment by Xiangyu Yao (Inactive) [ 29/Jan/19 ]

Taking it out of the project and putting it to the backlog because the condition to selectively relax constraint will be different when the enhancement of two phase drop is done. We should revisit this ticket when we do the enhancement.

Comment by Xiangyu Yao (Inactive) [ 28/Jan/19 ]

Yes you are right.

Comment by Benety Goh [ 28/Jan/19 ]

Is this also dependent on storage engine support for pending idents and checkpoints?

Comment by Xiangyu Yao (Inactive) [ 25/Jan/19 ]

We should check FCV to selectively relax this constraint because FCV 4.0 may indicate that dropCollection was done in 4.0-style (rename) so NamespaceNotFound may still be trigger during replication recovery.

Generated at Thu Feb 08 04:49:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.