[SERVER-18931] go into RECOVERING for ops that cannot be applied Created: 11/Jun/15  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Eric Milkie Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Replication
Participants:

 Description   

Today, we shut down a secondary if it encounters an unappliable op in the replication stream. For example, an op generated by a PRIMARY running a newer version of MongoDB may indicate a metadata change with a property that the secondary's version does not support.
Instead of shutting down, we could instead transition to RECOVERING and stay there until the offending op has been reversed by a subsequent delete or removal op.

For example, if a create-index op indicates a "v" version field that the secondary does not support, it would skip that op and transition to RECOVERING. It would continue to process ops. Eventually, it may encounter a drop-index op for the same index; after processing that op, the node would transition back into SECONDARY.

This work requires that the list of "bad operations" that need to be undone, be stored durably. Once the list becomes empty, it is safe to transition out of RECOVERING.


Generated at Thu Feb 08 03:49:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.