Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.4.11, 5.1.0-rc0, 5.0.5
Affects Version/s: 4.4.7, 5.0.0-rc8
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v5.0, v4.4, v4.2, v4.0
Sprint:
Repl 2021-08-09, Repl 2021-08-23, Repl 2021-09-06, Repl 2021-09-20, Repl 2021-10-04, Repl 2021-10-18
Linked BF Score:
115
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

(10/4/21 discoverability update: We figured out this happens due to the relevant secondary read getting a readSource of lastApplied (per most other use cases). Making that an untimestamped read solves the problem.)

The calls to applyCommand_inlock and scheduleOplogWrites in secondary application are not atomic. So it's possible that when an initial syncing node chooses a secondary as a sync source, it sees that a command like drop has been applied, but misses the oplog entry when calculating the stopTimestamp.

The following can happen:

Initial syncing node sees the drop on collection foo has been applied on a secondary sync source (but no oplog write yet). The collectionCloner will stop with NamespaceNotFound error, expecting us to apply the drop during the initial sync oplog application phase.
Initial syncing node fetches the lastApplied of the sync source, setting the stopTimestamp to T.
The sync source writes the oplog for the drop from (1) at timestamp T + 1.
The initial syncing node reaches stopTimestamp T, transitions to secondary, and applies the drop, and crashes because the collection does not exist.

Assignee:: Vesselina Ratcheva (Inactive)
Reporter:: Jason Chan
Participants:: Githook User, Jason Chan, Vesselina Ratcheva, Vivian Ge
Votes:: 0 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Jul 16 2021 07:43:17 PM UTC
Updated:: Oct 29 2023 09:50:44 PM UTC
Resolved:: Oct 05 2021 01:41:51 AM UTC
Confidence Status Last Update:: 19/Aug/21 5:15 PM

Details

Description

Attachments

Forms

Activity

People

Dates