[SERVER-34440] Secondary reads from internal (non-network) clients should read at the last applied timestamp Created: 12/Apr/18 Updated: 29/Oct/23 Resolved: 14/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Louis Williams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | nyc | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Storage NYC 2018-05-21 | ||||||||
| Participants: | |||||||||
| Description |
|
When checking whether or not to read from the local snapshot (last applied timestamp) on a secondary, we restrict to clients where isFromUserConnection is true. This is to prevent a bug where rolled back index builds are still visible (similar to the invariant in BF-8258). See this patch build failure where the check for isFromUserConnection is removed: The most likely cause is that rollback does not correctly rewind the last applied snapshot timestamp. We should evaluate how replication rolls back its timestamps and ensure that the local snapshot timestamp is correctly updated. The solution should be for users of ShouldNotConflictWithSecondaryBatchApplication who opt-out of taking the PBWM lock should be able to read without a timestamp. This applies to the threads listed in the comments, with the addition of rsBackgroundSync. |
| Comments |
| Comment by Githook User [ 14/May/18 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: |
| Comment by Louis Williams [ 08/May/18 ] |
|
Discussed with spencer, daniel.gottlieb, and judah@mongodb.com. We believe there is no reason that any of the internal readers listed above, nor any readers that opt-out of taking the PBWM lock using ShouldNotConflictWithSecondaryBatchApplicationBlock should read at the last applied timestamp. These readers would already have be exposed to reading inconsistent data, if any. For oplog reads by FTDC or repl, we do not advance the oplog "all-committed" time until the end of batches, so there is no additional risk posed to readers. Rollback needs to read without a timestamp when reloading the catalog, or the bug described above can occur. In all known cases, reading without a timestamp from internal readers is acceptable behavior.
|
| Comment by Louis Williams [ 30/Apr/18 ] |
|
I have identified at least one failure with rollback related to this change. Consider the following scenario:
Possible solutions:
|
| Comment by Louis Williams [ 30/Apr/18 ] |
|
The following internal threads use AutoGetCollectionForRead, and would be affected by reading from the last applied timestamp:
|