[SERVER-39356] Change stream updateLookup queries with speculative majority may return uncommitted data Created: 01/Feb/19 Updated: 29/Oct/23 Resolved: 15/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 4.1.7 |
| Fix Version/s: | 4.1.10 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | William Schultz (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Sprint: | Repl 2019-02-25, Repl 2019-03-11, Repl 2019-03-25 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Linked BF Score: | 57 | ||||||||||||||||||||||||||||
| Description |
|
Speculative majority change streams that do an update lookup query will wait for the most recent lastApplied optime of a replica set node to majority commit before returning results to the client. This is intended to provide a guarantee to the client that the data it received is majority committed. This contract may be violated, however, in the case where a node's lastApplied optime lags behind the optime of the newest "storage committed" oplog entry. That is, there may be an oplog entry (and corresponding data operation) written to storage that is visible to readers, but the lastApplied optime of the node does not yet reflect it. This is possible because a primary node advances its lastApplied optime inside the onCommit handler of an operation's transaction. There is a nonzero length of time between the commit of the WriteUnitOfWork at the storage layer and when the optime is advanced for that operation. If a concurrent reader reads the effects of such a transaction and reads lastApplied before the onCommit handler has fired, it may wait for the incorrect optime to commit and return data that is not, in fact, majority committed. This is an issue for primaries. On secondaries lastApplied is only updated at the end of batch application, so the same problem does not manifest. |
| Comments |
| Comment by Githook User [ 18/Mar/19 ] |
|
Author: {'email': 'william.schultz@mongodb.com', 'name': 'William Schultz', 'username': 'will62794'}Message: This commit adds an integration test to verify that speculative majority change stream reads do not return incorrect results when reading concurrently with secondary batch application. The goal is to ensure that, due to the changes from |
| Comment by Githook User [ 15/Mar/19 ] |
|
Author: {'email': 'william.schultz@mongodb.com', 'name': 'William Schultz', 'username': 'will62794'}Message: Speculative majority change streams provide "majority" read guarantees by reading from a local snapshot of data and then waiting for that data to become majority committed, instead of reading directly from a majority committed snapshot. In order to satisfy this guarantee a speculative majority read must wait for the proper timestamp to become majority committed after reading data. If the newest data it read reflects a timestamp T, then it must wait for a timestamp >= T to become majority committed. In general, waiting on replication's lastApplied timestamp is not safe, since it is possible for writes to be visible to readers even if those writes have not yet advanced the in-memory value of lastApplied. To work around this issue for speculative majority reads, we instead choose to read from an explicitly chosen timestamp in the storage engine, and then wait on that timestamp to majority commit. This gives us a more direct way to know what timestamp the data we read reflects. We utilize the `kNoOverlap` read source, which allows us to read from the min(lastApplied, all_committed), which is a convenient way to make these reads work correctly on both primaries and secondaries. |
| Comment by Githook User [ 06/Mar/19 ] |
|
Author: {'name': 'William Schultz', 'email': 'william.schultz@mongodb.com', 'username': 'will62794'}Message: This patch refactors the SpeculativeMajorityReadInfo class and the awaitOpTimeCommitted method to accept timestamps as input instead of optimes. When waiting for an operation to majority commit, term information, which is included in optimes, isn't necessary, since timestamps are totally ordered within a local oplog, and so are safely comparable. It is, for example, safe to determine if a local oplog entry is majority committed by checking if its timestamp is less than that node's local view of the majority commit point. This patch should not introduce any observable functional changes. |
| Comment by William Schultz (Inactive) [ 05/Feb/19 ] |
|
This problem and intended fix should be specific to primaries. The proposed solution is to have update lookup queries read from all_committed on primary and wait on that timestamp to commit so we can be guaranteed the data read became committed. On secondaries, the issue is not quite the same, since lastApplied is only updated at the end of each batch. On secondaries, update lookup queries can read during the middle of batch application, which causes a different problem, referenced in |