[SERVER-32187] Metadata changes on secondaries don't advance the min majority read timestamp enough Created: 06/Dec/17  Updated: 30/Oct/23  Resolved: 29/Mar/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.6.0
Fix Version/s: 3.7.4

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: rollback-optional
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-30809 Investigating remaining writes to the... Closed
is depended on by SERVER-32188 Have secondaries apply timestamps to ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Sprint: Repl 2018-01-01
Participants:

 Description   

The in-memory collection/index catalog is not currently versioned, it only knows of the current state. When reads come in for a collection, the in-memory catalog will issue requests to the storage engine's on-disk catalog, which may be versioned (ahem: for KV engines, must be versioned for storage engines supporting majority reads and the like). Requests that find an in-memory catalog entry for a collection/index also expect the storage engine to find the corresponding entries.

Majority/at a timestamp reads that come in which find an in-memory catalog entry for "now", still expects the storage engine to find this corresponding on-disk entry in the past (the time the read is issued for). To satisfy this requirement, metadata changes are tracked on a per-collection basis. If a majority comes in for a collection C when the majority commit timestamp is 5, but there was a metadata change to C at 10, the node will block until 10 becomes majority committed. For completeness, if more metadata changes come in during this wait, the reader will observethose and continue to block.

When a secondary applies a create[Collection] operation, that operation is applied in its own batch (all commands are done this way). During this operation, suppose the lastApplied is timestamp T. Due to the batching of commands, the optime of the create[Collection] is necessarily T + 1. However the minimum snapshot value will be updated to T. However the correct value is T+1.

If the majority commit point is on this boundary when a read (for the affected collection) comes in, the in-memory catalog and the on-disk catalog will be in disagreement, resulting in a crash.



 Comments   
Comment by Judah Schvimer [ 29/Mar/18 ]

We're confident that the existing tests cover this. Closing as fixed.

Comment by Judah Schvimer [ 03/Jan/18 ]

This requires additional unittesting so we are not resolving this ticket yet. The above commit should account for all functional changes needed and was required for SERVER-32188 to pass in evergreen.

Comment by Githook User [ 03/Jan/18 ]

Author:

{'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah@mongodb.com'}

Message: SERVER-32188 SERVER-32187 Have secondaries apply timestamps to commands
Branch: master
https://github.com/mongodb/mongo/commit/8c0485475323b1e0582f56a071878339c78ee01d

Generated at Thu Feb 08 04:29:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.