[SERVER-32187] Metadata changes on secondaries don't advance the min majority read timestamp enough Created: 06/Dec/17 Updated: 30/Oct/23 Resolved: 29/Mar/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.6.0 |
| Fix Version/s: | 3.7.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Daniel Gottlieb (Inactive) | Assignee: | Judah Schvimer |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | rollback-optional | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v3.6
|
||||||||||||||||
| Sprint: | Repl 2018-01-01 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
The in-memory collection/index catalog is not currently versioned, it only knows of the current state. When reads come in for a collection, the in-memory catalog will issue requests to the storage engine's on-disk catalog, which may be versioned (ahem: for KV engines, must be versioned for storage engines supporting majority reads and the like). Requests that find an in-memory catalog entry for a collection/index also expect the storage engine to find the corresponding entries. Majority/at a timestamp reads that come in which find an in-memory catalog entry for "now", still expects the storage engine to find this corresponding on-disk entry in the past (the time the read is issued for). To satisfy this requirement, metadata changes are tracked on a per-collection basis. If a majority comes in for a collection C when the majority commit timestamp is 5, but there was a metadata change to C at 10, the node will block until 10 becomes majority committed. For completeness, if more metadata changes come in during this wait, the reader will observethose and continue to block. When a secondary applies a create[Collection] operation, that operation is applied in its own batch (all commands are done this way). During this operation, suppose the lastApplied is timestamp T. Due to the batching of commands, the optime of the create[Collection] is necessarily T + 1. However the minimum snapshot value will be updated to T. However the correct value is T+1. If the majority commit point is on this boundary when a read (for the affected collection) comes in, the in-memory catalog and the on-disk catalog will be in disagreement, resulting in a crash. |
| Comments |
| Comment by Judah Schvimer [ 29/Mar/18 ] |
|
We're confident that the existing tests cover this. Closing as fixed. |
| Comment by Judah Schvimer [ 03/Jan/18 ] |
|
This requires additional unittesting so we are not resolving this ticket yet. The above commit should account for all functional changes needed and was required for |
| Comment by Githook User [ 03/Jan/18 ] |
|
Author: {'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah@mongodb.com'}Message: |